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Summary 

The solution of a block tridiagonal matrix using parallel 
processing is demonstrated in this report. The multiprocessor 
system which obtained the results and the software envi- 
ronment used to program that system are described. 
Theoretical partitioning and resource allocation for the 
Gaussian elimination method used to solve the matrix are 
discussed. The results obtained from running one-, two-, and 
three-processor versions of the block tridiagonal solver are 
presented. The PASCAL source code for these solvers is given 
in the appendix, and it may be transportable to other shared- 
memory parallel processors, provided that the synchronization 
routines are reproduced on the target system. 

Introduction 

Many computationally intensive problems can benefit from 
the use of parallel processing. One such problem, common 
to many fluid mechanics and structural dynamics applications, 
is the solution of large matrix equations. Because of the 
differencing techniques used in solving the partial differential 
equations that describe fluids and structures systems, the 
resulting matrices often exhibit a block tridiagonal structure. 
The block tridiagonal matrix requires much less computation 
to solve than a full A by A matrix. A full matrix requires 
approximately A 3 operations to solve; a block tridiagonal 
matrix requires approximately A operations. 

Although the block tridiagonal structure significantly reduces 
computational effort, considerable time is still spent in the 
matrix solution. This is especially true in many iterative 
linearization techniques, such as Newton-Raphson, where a 
full matrix solution is required for every iteration. Because 
of this, other parallel processing techniques which can further 
reduce the amount of computation required to arrive at a 
solution should be investigated. 

This paper presents the solution of a block tridiagonal matrix 
on a parallel processor. The block tridiagonal equations 
analyzed were taken from a transient rotor dynamics simulation 
program (ref. 1). In this program, Gaussian elimination is used 
to solve the matrix. 

The real-time multiprocessor simulator (RTMPS) was used 
to solve these equations in parallel (refs. 2 to 5). The RTMPS 
is a parallel processor designed to do real-time simulation of 
dynamic systems. The hardware consists of dual busses with 


processors on each bus. A dual-port memory provides commu- 
nication between the two busses by connecting processors on 
one bus to the processors on the other bus. Considerable 
software support is provided for one-dimensional scalar 
problems by a real-time multiprocessor language (RTMPL) 
and a real-time multiprocessor operating system (RTMPOS). 

The potential of parallel processing for improving the per- 
formance of linear algebra routines has prompted a significant 
amount of research (refs. 6 to 8). Also, a significant amount 
of literature exists on the use of vector processors for linear 
algebra. Since vectorization of code involves the identification 
of the lowest level of parallelism (e.g., operation level 
parallelism), the principles behind both areas of research are 
veiy similar. Because of the high percentage of nested loops 
in linear algebra code, the ideal architecture for most linear 
algebra applications would consist of multiple vector processors. 

This paper presents the application of parallel processing 
using one particular architecture (RTMPS) to one algorithm 
(Gaussian elimination). This combination, however, may not 
be the best approach to the problem. As mentioned previously, 
there are other architectures and algorithms that may be better 
suited for this application. The RTMPS system was used for 
this study because it was the only parallel processing hardware 
conveniently available. The intent of this study is to identify 
some practical aspects of implementing a commonly used 
algorithm on a parallel processor. The investigation of other 
architectures and algorithms will be the focus of future research. 

Problem Description 

The structure of the block tridiagonal matrix is shown in 
figure 1. Each block row, except the first and last, consists 
of three M by M blocks. There are A block rows total, 
including the first and last. If this matrix is called A, then the 
general problem is to find the solution to the system of 
equations 

A\ = b 

where x and b are vectors, A elements in length. 

A common method for solving this system is to perform a 
forward elimination of all coefficients below the diagonal and 
then a back substitution to solve for the vector x. This 
procedure, called Gaussian elimination, is illustrated in the 
following example for a 3 by 3 matrix. 
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a u a i2 a i3 x \ b x 

a 2\ a 22 a 23 x 2 b 2 

^31 a 32 a 33 J L *3 J L ^3 

b x a n = a u /a u — 1; a n — ^vJ a w\ a \3 = a \ 3 ^ a \\ 
b 2 b x — b x la n \ a 22 — a 22 — a 2x a x2 

b 3 J a 23 = ^23 ~~ a 2\ a l3> b 2 = b 2 — a 2x b x 

b\ d 32 = &32 - a 3\ a \2 

b 2 a 33 = a 33 ~~ a 3\ a \3 

b 3 J b 3 = b 3 -a 3X b[ 

1 a i2 a 13 X x b x @22 ~ a 22^ a 22 = ^ \ a 23 ~ a 23^ a 22 

<f> 1 a 2 3 x 2 b 2 b 2 = b 2 ia 22 ; a 33 = a 33 — a 32 a 23 

<t> <t> &33 __ x 3._b 3 b 3 — b 3 — a 32 b 2 

a 33 ~ a 33^ a 33 ~ 1 

^33 = b 33 /a 33 


Gaussian elimination is efficiently performed on a block 
tridiagonal matrix by applying a partial elimination process 
to four adjacent blocks at a time (fig. 2). The process begins 
with blocks 2 and 3 from the first block row and blocks 1 and 
2 from the next block row. The Gaussian elimination procedure 
is applied to the matrix determined by these four blocks. 
However, the process stops once block 2 in the first block row 
is made upper right triangular. As a result of this process, block 
1 in block row 2 is zero at this time. 

This process is then repeated on the next group of blocks 
starting in the next block row and continuing for the whole 
matrix, moving the four-block template down through the 
matrix one block row at a time. Thus, by repeating a partial 
2 M by 2 M Gaussian elimination N times, the tridiagonal matrix 
is transformed to upper right triangular form. 

After the matrix has been transformed to an upper right 
triangular matrix, the result vector x can be solved by using 
back substitution starting from the bottom of the matrix. This 
is done by solving for the last element of the result vector x, 
and substituting that value into the equation for the second last 
element of x (next row up). Now, two values of the result 
vector are available for substitution into the equation for the 
third last element. The procedure is repeated one row at a time, 
proceeding upward through the matrix until all elements of 
x have been solved. The following equation illustrates the back- 
substitution process for the example problem. 


*3 = bi 

x 2 — ^2 — ^ 23-^3 

*i =b[ ~a u x 3 -a\ 2 x 2 



Figure 1. —Structure of block tridiagonal matrix. 
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(a) Step 1 . 

(b) Step 2. 

(c) Step N. 


Figure 2.— Gaussian elimination for block tridiagonal matrix. 


The block tridiagonal matrix solver used in the rotor 
dynamics application consists of 30 block rows. Each block 
is 4 by 4. Thus, N =30 and M = 4 for this application. 

Partitioning Approach 

An approach for parallelizing the Gaussian elimination 
procedure was developed by examining the data flow of the 
problem. A data flow diagram for the 3- by 3-matrix example 
is shown in figure 3. The circles represent mathematical 


operations, and the interconnections show the flow of data 
between calculations. For the 3- by 3-matrix example, 31 
operations must be performed. A single computer can only 
execute these operations one at a time. The data flow diagram 
suggests that, if several computers are available, multiple 
operations could be done concurrently. The five stages of the 
computations are bracketed on the right side of figure 3. Within 
each stage, each vertical operation stream can be done in 
parallel. Stages 2 and 4 have streams of two operations each, 
while all other stages have streams of only one operation. 

Most parallelism exists in the second stage, where eight 
operation streams can be done in parallel. If eight processors 
were available, the 16 operations of stage 2 could be done in 
a net count of two operations. Stage 1 would require four 
processors and could be done in a net count of one operation. 
Stages 3 and 4 would require three processors and could be 
done in net counts of one and two operations, respectively. 
Finally, stage 5 requires two processors and could be done 
in one operation. The minimum count for execution of the 
entire problem is the critical path. The critical path is the 
longest of the parallel operation streams in the data flow graph. 
In this example, the critical path is seven operations. Since 
each stage is done serially, only the maximum number of 
processors in any stage would be required (eight in this case). 

The data flow diagram for the back-substitution process is 
shown in figure 4. There are two stages, and the critical path 
is four operations. The maximum number of processors 
required is two. 


a 11 a 11 a 12 a 11 a 13 a 11 b 1 a 11 



Figure 3.— Data flow diagram for Gaussian elimination (3 by 3 matrix). 


3 


b m 3 = x 3 



Figure 4.— Data flow diagram for back substitution (3 by 3 matrix). 

The solution of the block tridiagonal matrix contains the 
same parallelism described for the 3- by 3-matrix example. 
In the solution process, a partial elimination is performed on 
a 2 M by 2 M system N times. The maximum number of pro- 
cessors required would be a function of M. The critical path 
would be TV multiplied by the critical path operation count for 
partial Gaussian elimination plus the critical path count for 
the back substitution. The data flow diagram would follow the 
same pattern as that of the 3 by 3 matrix, only the length and 
width would vary as the size of the matrix. A detailed analysis 
is given in the Theoretical Speedup Analysis section. 

A PASCAL-coded version of the single-processor matrix 
solver used in the rotor dynamics simulation is given in the 
appendix. This is a direct PASCAL translation of the 
FORTRAN code used in the simulation. The procedures 
GETINF, ID AT A, and IDATF are related to I/O on the unique 
hardware used for this study. The purpose of these procedures 
is described later in this report. The parallel structures 
discussed previously can be seen in the main body of the code. 
There are two main loops in the program. The outer loop (IB) 
cycles through the block rows of the matrix. The next loop 
(IP) does the partial Gaussian elimination on the 8 by 8 
submatrix composed of blocks 2 and 3 in the current block 
row and blocks 1 and 2 in the next block row. Within the IP 
loop are six smaller loops which essentially perform the 
operations diagramed in the data flow graph in figure 3. The 
first two loops perform the divide operations, and the next 
four loops perform the multiply and subtract operations. As 
shown in the data flow graph, all divides can be done in parallel 
followed by all multiply and subtracts being done in parallel. 
This process is represented by the bracketed rows 1 and 2. 
The sequence is repeated for all four IP iterations. 

The code for the back-substitution process is next in the 


program. Since the original code was not written with parallel 
processing in mind, there are no operations which can be done 
in parallel while using the code shown. Each result vector 
element is found by solving one row at a time. Each iteration 
of the outermost loop (IBI) depends on results from the 
previous iteration. The same is true for the next level loop 
(II). The innermost loops within the II loop are recursive in 
nature (the calculation of a variable depends on itself from 
a previous iteration) and, therefore, cannot be done in parallel. 

The data flow graph for the back substitution, however, 
shows that parallel operations can be done. Figure 4 shows 
that once an element of the result vector has been calculated, 
it can be used to calculate parts of proceeding elements. Thus, 
partial sums of other result vector elements can be computed 
in parallel. This algorithm, called the column sweep, is 
described in reference 6. The column sweep algorithm requires 
a different coding approach than that used in the rotor dynamics 
version of the back-substitution process. A new version was 
coded and used for the two- and three-processor matrix solvers 
discussed later in this paper. The use of the column sweep 
algorithm exemplifies the type of analysis required for selecting 
an algorithm to run on a parallel processor. 

Theoretical Speedup Analysis 

The theoretical speedup for the parallel Gaussian elimination 
algorithm is computed by dividing the operation count for the 
serial version by the net operation count for the parallel 
version. An operation is one of the basic floating-point math 
operations: add, subtract, multiply, and divide. 

Table I shows the determination of the operation count for 
the serial algorithm for one IB iteration of the forward 
elimination procedure and one I iteration of the back 
substitution. The table assumes a 4 by 4 block size. One IB 
iteration consists of four IP iterations, and the operation count 
for each IP iteration depends on the value of IP. For a matrix 
of 30 block rows, the operation count (OPS) would be 

OPS = 30(number of operations per IB) 

+ 30(number of operations per IP) 
= 30(370) 4- 30(44) = 12 420 operations 

To simplify the analysis, it is assumed that the last block row 
is a full 8 by 8 matrix, although it is actually 4 by 8. 

The operation count for an N block row, M- by A/-block 
tridiagonal matrix would be 

[ M 

£ [(2 M + 2 - 0 + 2(2 M + 2- i)(2M - i) 

+ 2 (M + i- l)]j 

_ N[M(4M + 7)(7 M - 1)] 

6 
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TABLE I . —DETERMINATION OF 
OPERATION COUNT 

(a) Gaussian elimination 


Loop, 

IP 

Operations 

Total 

number of 
operations 

Divide 

Multiply 

Subtract 

1 

9 

63 

63 

135 

2 

8 

48 

48 

104 

3 

7 

35 

35 

77 

4 

6 

24 

24 

54 

— 

— 

— 

— 

370 


operations. Based on the total operation count for the fully 
parallel forward elimination and back-substitution processes 
(assuming 30 block rows), the total operation count would be 

OPS = 30(4x3 operations) + 30(4x2 operations) 

= 600 operations = 30x4x5 

or, in general, 


OPS = NxMx 5 operations 

For the matrix used in this study, the theoretical speedup (S) 
would be 


(b) Back substitution 


Loop, 

I 

Operations 

Total 

number of 
operations 

Multiply 

Subtract 

1 

4 

4 

8 

2 

5 

5 

10 

3 

6 

6 

12 

4 

7 

7 ! 

14 

1 

~ 

i 

44 


The data flow graphs in figures 3 and 4 suggest that a number 
of operations can be done in parallel. For the forward elim- 
ination process, the total number of operations which can be 
done in parallel is a function of the iteration index IP. Table II 
summarizes the maximum number of operations which can 
be performed in parallel as a function of IP. The last column 
shows the net operation count for each IP iteration (three) if 
there are enough processors available to match the number 
of operations that can be done in parallel. Each IP iteration 
consists of a parallel divide cycle, followed by a parallel 
multiply and subtract cycle. The net operation count is one 
for the divide cycle and two for the multiply and subtract cycle. 
Each IP iteration has three operations. As IP increases, the 
number of processors that can be used decreases. 

Table II also shows the maximum number of parallel 
operations for each I iteration of the back-substitution process. 
Again, the net operation count is shown for the case where 
the number of processors matches the number of parallel 

TABLE II. -DETERMINATION OF PARALLEL 
OPERATION COUNT 

(a) Forward elimination (b) Back substitution 


Loop, 

n 

Number of 
processors 

Net 

operation 

count 

1 

4 

2 

2 

5 

2 

3 

6 

2 

4 

7 

2 

— 

— 

8 


Loop, 

II 

Number of 
processors 

Net 

operation 

count 

1 

63 

3 

2 

48 

3 

3 

35 

3 

4 

24 

3 

~ 


12 


12 420 
600 


20.7 


and, in general, 


5 = 


M 


E 


(2M + 2 - Q + 2(2M + 2 - Q(2M - Q + 2 (M 4- i - 1) 
_ 5M 


(4 M + 7) (7M - 1) 
30 


for a N block row, M- by Af-block matrix. 

The theoretical speedup would be achieved if the maximum 
number of processors (63 as determined from table II) are 
available to perform the computations. Any overhead due to 
inefficient resource allocation (discussed in the next section) 
or communication between processors has been ignored. This 
simplification is made because of the difficulty in estimating 
the time required for such overhead. The theoretical speedup 
is useful only as an upper limit to determine if parallel 
processing can potentially benefit an application. 

Determining the theoretical speedup is more complicated 
when less than the maximum number of processors is 
available. The speedup will also be a function of the way the 
parallel computations are allocated to the processors. For 
example, if there are four parallel operations and three 
processors, the net operation count would be two because the 
fourth operation must be done in serial on one of the three 
processors. The theoretical speedup for the three-processor 
matrix solver was determined to be 2.9 based on the best 
resource allocation possible. 

Resource Allocation 

Allocating processor resources is a critical step in running 
any code on a parallel processor. If the processor resources 
(e.g. , the number of processors) match the number of parallel 
tasks in a problem, then a one-to-one allocation can be done. 
This approach is not always efficient, however, as processors 
can spend much time in an idle state. In some cases this 
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inefficiency is unavoidable. In others, a “packing” algorithm 
can be used to assign the parallel tasks to the minimum number 
of processors necessary. If the processor resources do not 
match the number of parallel tasks, then a packing algorithm 
is a necessity. Ideally, an automated procedure would assign 
the parallel computations to the available processors and 
generate the appropriate load modules (to execute on the 
processors). Such a procedure, unfortunately, was not 
available for this study. 

A technique for allocating the parallel operations of the matrix 
solver to the appropriate processors was necessary. One, called 
the loop-unrolling technique, would require decomposing the 
loops into individual equations. For example, consider the 
following loop: 

FOR I: = 1 to 5 DO 

FOR J: - I to 5 DO 
A(I,J): = B(I,J) * C(I,J); 

The doubly nested loop can be decomposed into 15 equations, 
and each of these could be executed in parallel. Suppose that 
only three processors were available for the solution of these 
equations. One method of allocating the equations to the 
processors would be to write all 15 equations and allocate 5 
equations to each processor. Although this apears easy for the 
given example, it can be tedious if there are hundreds of 
thousands of equations. Another, less tedious, method would 
be to use the following code segment on each of the processors: 

FOR I: = 1 to 5 DO 

BEGIN 

J: = (I - 1) + PID; 

WHILE J < = 5 DO 
BEGIN 

A(I,J): = B(I,J) * C(I,J); 

J: = J + NPROC; 

END; 

END; 

where PID is the processor identification number (in this case 
1, 2, or 3) and NPROC is the number of processors (three 
for this example). In this method, called iteration allocation, 
each processor performs only the iterations which are assigned 
to it. The preceding example results in the allocation of 
computations as follows: 

PI P2 P3 

A(l,l) A(l,2) A(l,3) 

A(l,4) A(l,5) A(2,4) 

A(2,2) A(2,3) A(3,5) 

A(2,5) A(3,4) 

A(3,3) A(4,5) 

A(4,4) 

A(5,5) 

Total OPS 7 5 3 


With 3 processors and this allocation method, the original 
15 operations could be done in the equivalent of 7 operations. 
Although this method is less efficient than writing 15 separate 
equations, it is less tedious. In fact, by adding the following 
lines of code before the J: = (I - 1) + PID line in processors 
1 and 3, the allocation can be improved: 

PI P2 

IF I > 3 THEN PID = 3 IF I > 3 THEN PID = 1 
ELSE PID = 1; ELSE PID = 3; 

This reallocates the A(4,4) and A(5,5) computations from 
processor 1 to processor 3. Now each processor solves five 
equations for a net count of five operations. However, this 
analysis ignores the overhead of the added control statements. 
Thus, iteration allocation is still less efficient than the loop- 
unrolling technique. But for large loops, iteration allocation 
is preferable since it is less tedious. 

The number of processors available is a critical factor in 
considering which method to use. If the number of processors 
approaches the number of parallel tasks, then the iteration- 
allocation method essentially approaches the loop-unrolling 
technique (in the amount of work necessary to generate a 
parallel program). In general, if the number of parallel tasks 
is much greater than the number of processors, iteration 
allocation is preferable to loop-unrolling. This was the case 
for the parallel block tridiagonal solver described in this report, 
which made iteration allocation the method of choice. 

Parallel Processing Hardware Description 

The parallel processing hardware system used to run the 
block tridiagonal solver is a subset of the real-time 
multiprocessor simulator (RTMPS) described in reference 2. 
Figure 5 is a block diagram of the actual hardware used. The 
separate processors on the real-time bus are Motorola VM04 
microcomputers, rather than the VM02 microcomputers used 
on the original RTMPS. The RTX channel linking the inter- 
active and real-time busses still uses VM02 microcomputers. 
In the current configuration, a maximum of three VM04 


REAL-TIME BUS 



INTERACTIVE 

BUS 

Figure 5.— Subset real-time multiprocessor system (RTMPS) architecture used 
for matrix solver study. 
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SYSTEM (REAL-TIME) BUS 
Figure 6.— VM04 microcomputer architecture. 


processors can be resident on the real-time bus. Expansion 
to two additional processors is possible with the existing card 
cage, but was not done for this study. 

Figure 6 shows the architecture of the VM04 micro- 
computer. A dedicated memory bus connects a processor board 
to a main memory board. Both the processor board and the 
memory board have separate system bus interfaces. A high- 
speed cache memory on the processor board reduces memory 
access times for frequently referenced memory locations. The 
caching of main memory is handled by hardware and is 
transparent to the user when only a single processor is used. 

Because of the cache memory, extra care must be taken when 
programming multiprocessor systems. A processor can have 
instances of “stale data” in its cache if another processor 
communicates with it through shared memory over the system 
bus. To avoid this problem, processors must disable their cache 
memories before accessing shared memory. One method, 
although time consuming, is to call a procedure to disable the 
cache each time a program requires access to shared memory. 
Another method would be to disable the cache entirely; 
however, there are many local-memory accesses which would 
lose the benefit of the faster cache memory. Fortunately, the 
VM04 contains a control register which allows the user to enable 
or disable caching of memory accesses that occur over the 
system bus. This register can be set once, and cache memory 
can be disabled for all shared-memory references (via the system 
bus), while local-memory references are still cached. 

Software Environment 

The existing RTMPL was designed to efficiently handle one- 
dimensional mathematical models. All arithmetic is done in 
scaled-fractions, and indexed variables (e.g., arrays) are not 
supported. An alternative language was needed to allow 
convenient programming of the block tridiagonal solver on 
the RTMPS hardware. To fill this need, a method was devised 
to allow PASCAL programs to be called from an RTMPL 
program. The the solver could be coded in the PASCAL 
language, a structured language with floating-point and indexed 
variable support. 


Running the PASCAL program as a subroutine under 
RTMPL maintains compatibility with the RTMPOS. This is 
important because RTMPL generates a data base that 
RTMPOS uses to load and execute the parallel processing 
programs. Changes were made to RTMPOS to allow 
recognition of the floating-point data type. Thus, many 
interactive features provided by RTMPOS for scaled-fraction 
programs could also by applied to floating-point programs. 

An RTMPL macro was written to transfer control from 
RTMPL to PASCAL. A new PASCAL initialization routine 
(ref. 9) was written to save any necessary RTMPL registers, 
execute the PASCAL program, restore the RTMPL registers, 
and return to the RTMPL program. RTMPL variables were 
used as buffers to transfer information from the PASCAL 
program to the RTMPL program. Special procedures were 
written to do the transfers. This represents one of the 
disadvantages of the RTMPL-PASCAL approach: Neither 
program recognizes the variables of the other. In order to 
output any results from the PASCAL program to RTMPOS, 
data must explicitly be transferred from a PASCAL variable 
to an RTMPL variable. This inconvenience can translate into 
high overhead if data is output frequently from the PASCAL 
program. Fortunately, for the block tridiagonal solver, the only 
output required was at the end of the program. 

The automated data-transfer setup feature of the RTMPL 
cannot be used with the RTMPL-PASCAL approach. All data 
transfers must be done from within the PASCAL program. 
One method for transferring data from PASCAL is to call a 
procedure to do the transfer. However, if there is frequent 
data transfer in the program, the overhead of the procedure 
call can significantly reduce the transfer speed. A better 
method is to exploit the way that PASCAL handles variables. 
Variables declared in the main PASCAL program are global 
variables; variables declared from within a procedure are local 
to that procedure. Global variables are shared by the main 
program and all procedures. This suggests that a shared- 
memory multiprocessor environment can be implementaed by 
using the global variable area as the shared memory. The 
advantage of a shared-memory approach is that data can be 
transferred implicitly between processors by a simple memory 
reference instruction. The need for a procedure call to transfer 
data is eliminated, thus, reducing overhead. 

Figure 7 shows how the PASCAL shared-memory approach 
is implemented for two processors connected by a bus. The 
PASCAL compiler maintains two registers for variable 
storage. The first (A5) points to the base of the global 
variables. The other register (A6) points to the base of the 
local variable area. If both processors (PI ,P2) have dual-ported 
memory, part of the memory of PI can be shared with P2. 
The PASCAL programs for PI and P2 would have the shared- 
memory variables declared first. The program code body 
would be contained in a procedure call, with any local variables 
declared within the procedure. Then the main program would 
merely call this procedure. The structure of the PASCAL 
program for both processors would be as follows: 
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MEMORY MAPS: 


PI. 




P2, 

GLOBAL 

GLOBAL 


GLOBAL 

GLOBAL 

VARIABLE — . 

(SHARED) 



VARIABLES 

VARIABLE -| 

POINTER 

VARIABLES 



(UNUSED) 

POINTER 

(A5) 





(A5) 

PI, 

LOCAL 



LOCAL 

P2, 

LOCAL — * 

VARIABLES 



VARIABLES 

LOCAL 

VARIABLE 





VARIABLE 

POINTER 





POINTER 

(A6) 

' — 1 



' — 1 

(A6) 


Figure 7.— PASCAL shared-memory approach. 


PROGRAM SOLVE; 

VAR 

(This is the declaration of shared-memory variables) 
PROCEDURE SOLVE _ CODE; 

VAR 

(This is the declaration of variables local to the processor) 
BEGIN 

(The main code body goes here) 

END; (Of SOLVE _ CODE) 

BEGIN (Of the main program) 

SOLVE _ CODE; 

END. (Of program) 

Since PI has the shared-memory area in its own memory, 
its global variable register A5 can be left as set by the PASCAL 
compiler. For P2 to reference the shared-memory area, its 
register A5 must have the base address of Pi’s memory (from 
the bus) added to it. When this is done, all global variable 
accesses set up by the compiler will automatically go to shared 
memory. This approach can be used for as many processors 
as required. 

All PASCAL language statements except those dealing with 
I/O, files, and pointers can be used. All I/O is done through 
the facilities provided by RTMPL and RTMPOS. These 
facilities include on-line examination of program variables and 
read advisories. The read advisory provides a method for 
recording large arrays of data from a program onto a disk file 
(RTMPL user’s manual). This method was used for the block 
tridiagonal solver to record the value of the result vector. 


Discussion of Results 


The block tridiagonal solver was run on the RTMPS system 
with one, two, and three processors. The PASCAL code for 
each of the cases is contained in the appendix of this report. 


The matrix notation used for the rotor dynamics problem is 
retained in this code. Array B in the PASCAL code is the 
matrix of coefficients, the array C is the right-side vector, and 
the array DU is the result vector. The first VAR declaration 
is the global, or shared-memory area. A multiply indexed array 
is used for the block tridiagonal matrix. The first two indices 
(from left to right) are the row and column indices within a 
block. The next index is the block row index, and the last index 
is the block index (1, left; 2, middle; 3, right). The vectors 
DU and C are doubly indexed arrays: The first index indicates 
element within the current block row, and the second index 
is the block row index. Although the use of multiple indices 
simplifies the programming procedure, it is very costly in 
computation time. 

The code for the single-processor solver is a direct PASCAL 
translation of the FORTRAN code used in the rotor dynamics 
problem. Procedure GETINF is used to send information about 
the variables (in this case, the result vector) to the RTMPS 
control processor. Calling procedure GETINF triggers a read 
advisory on the control processor which saves results in a disk 
file. Procedures IDATA and IDATF initialize the matrix and 
right-side vector to values that were generated by the rotor 
dynamics simulation. The use of actual data from the rotor 
dynamics simulation was important since the existence and 
accuracy of a matrix solution depends heavily on the matrix 
values. The results generated by the single-processor solver, 
as well as those for the two- and three-processor solvers, were 
compared to results generated by the rotor dynamics simulation 
on a mainframe computer. In all cases, the results matched 
exactly. 

There are two versions of the two-processor solver given 
in the appendix: The first contains the original serial back- 
substitution algorithm; the other does the back substitution by 
using the column sweep approach. In both versions, the 
forward elimination process is done in parallel, and iterations 
within the IP loop are allocated to each processor. This is done 
with the WHILE-DO construct, as described in the Resource 
Allocation section of this report. Before each IP iteration 
begins, both processors synchronize to insure that the previous 
IP iteration was completed. This is critical since results from 
the previous iteration are needed to calculate the next iteration. 
Two boolean flags, one for each processor, are used to 
synchronize the processors. The flags are located in the global, 
or shared-memory, area. Both processors set their respective 
flags true after they have finished an IP iteration. Before starting 
the next iteration, each processor checks the other’s flag to 
make sure they are synchronized. Then both flags are cleared, 
and the iteration can begin. If one processor is not done, the 
other will wait for it. A counter is tested to exit the wait loop 
if the other processor does not respond. 

The version of the two-processor solver with the column- 
sweep back-substitution algorithm differs from the serial back 
substitution version in two ways: (1) The synchronization is 
done with an assembly language procedure to decrease its 
execution time. The assembly procedure performs exactly the 
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same function as the original PASCAL version of the 
procedure (which is commented out in the listing); and (2) the 
back-substitution process, previously done on one processor, 
is now done on two processors. After an element of the result 
vector is computed, both processors work on computing partial 
results of other vector elements. Both processors then 
synchronize, compute the next full result vector element, and 
repeat the process until the entire result vector is obtained. 

The code for the three-processor version of the solver uses 
a synchronization method which is more efficient than that 
used in the two-processor case. When each processor is done 
with its iteration, it sends a flag to each of the other processors 
in the system. Before starting the next iteration, each processor 
tests for the flags sent to it by the other processors. Since these 
flags are now in local memory (not global memory as in the 
two-processor case), the processor does not have to continually 
access the bus to test a flag. This reduces bus traffic for those 
processors which may still be accessing shared memory to 
complete their computations. 

Another technique used in the three-processor solver to 
reduce bus traffic is the copying of frequently accessed 
variables from shared memory to local memory. In the three- 
processor code, arrays BI2 and BI3 are local-memory variables 
which contain current matrix row information used frequently 
throughout the program. These arrays are loaded with 
appropriate values from shared memory at the beginning of 
an IP iteration. All future references to these values are made 
from local memory, and the number of bus accesses required 
is reduced. 

Table III summarizes the running time for each of the three 
cases. The speedup for each of the multiprocessor runs is also 
shown. A 30-block row matrix, with 4 by 4 blocks, was solved 
in each case. For the two-processor case, results are given 
for the serial back substitution and for the column-sweep back- 
substitution algorithms. As expected, the column sweep 
algorithm gives a faster solution. The two-processor case 
shows a speedup for 1 .96, very close to the ideal value of 2. 
The three-processor case is less efficient with a speedup of 
2.7. The reduction in efficiency can be attributed to a number 
of factors: Resource allocation, loss of cache variables, and 
increased access time for the shared memory because of 
increased bus traffic. 


TABLE m. -TIMING INFORMATION 
FOR MULTIPROCESSOR 


Number of 
processors 

Back 

substitution 

Time, 

sec 

Speedup 

1 

Column sweep 

0.9502 


2 

Serial 

Column sweep 

0.5168 

.4834 

1.838 

1.965 

3 

Column sweep 

0.3500 

2.715 


Several important notes are given here regarding cache 
memory. All multiprocessor runs were made with the cache 
memory enabled on all processors which did not contain the 
shared memory. The processor which did contain the shared 
memory had its cache disabled. This processor could not take 
advantage of the control register cache disabling for bus 
accesses (described in the hardware section) since all variables 
are physically within its own memory. The single-processor 
case used as the reference for speedup calculations was run 
with cache memory enabled. Variables which are in shared 
memory for the multiprocessor cases (and, hence, not cached) 
can be cached in the single-processor case. Thus, a certain 
percentage of the speedup achieved through parallel processing 
can be offset by the loss of cache variables. Although it appears 
that this is not a factor in the two-processor case, it may 
account for some of the overhead in the three-processor case. 

A synchronization problem was encountered during the 
development of the three-processor solver which highlighted 
one of the difficulties with transporting existing algorithms 
(written for serial processors) to parallel processors. In the 
Gaussian elimination process, before elements below the 
diagonal are eliminated the original values are needed to 
compute other elements of the matrix. Thus, the sequence of 
the computations is critical. All processors would have to be 
synchronized (in addition to the synchronization that must be 
done for each IP iteration) to insure that the original value 
of the element being eliminated has been used by the other 
processors needing it. For example, consider the following 
elimination step for a 3 by 3 matrix: 

(1) BR: = A(2,l); 

(2) A(2,l): = A(2,l) — BR * A(l,l)/A(l,l); 

(3) A(2,2): = A (2, 2) - BR * A(l,2)/A(l,l); 

(4) A(2,3): = A(2,3) - BR * A(l,3)/A(l,l); 

(5) F(2): = F(2) — BR * F(l)/A(l,l); 

where A is the array of matrix elements and F is the right- 
side vector. In statement (1), BR is assigned the value of 
A(2,l), and the computation of A(2,l) in statement (2) will 
result in zero. Statement (1) is antidependent on statement (2) 
(ref. 10). Assume that four processors are available to do 
statements (2) through (5) with all data resident in a shared 
memory (except for BR which is in each processor’s local 
memory). Each processor must execute the assignment state- 
ment which copies the value for A(2,l) from shared memory 
into local variable BR. It would appear, since each processor 
performs the same number of operations, that each processor 
could safely read A(2,l) before it is changed by processor 1. 
This also assumes that all processors begin their operations 
at the same time. However, timing differences between 
processors, communication delays between processors and 
shared memory, and load imbalances make this assumption 
dangerous. This was the case for the three-processor version 
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of the block tridiagonal solver as it was derived from the 
original FORTRAN code used in the rotor dynamics 
simulation. Synchronization routines were necessary which 
added overhead and resulted in slower execution. 

The addition of synchomization routines for this part of the 
code can be avoided, however, by examining the Gaussian 
elimination process closer. The back-substitution process only 
requires elements above the diagonal to compute the result 
vector. The zeros below the diagonal are not needed. In fact, 
the computations which create the zeros are not necessary if 
the resulting upper right triangular matrix is only needed to 
compute the result vector. If the A(2,l) calculation was 
eliminated in the previous example, each processor would read 
the correct value of A(2,l) without synchronization problems. 
This approach was taken for the three-processor solver to 
achieve the speedup of 2.7. The single processor time used 
in the speedup calculation includes the computation of zero 
elements below the diagonal. If these computations are 
removed from the single-processor code also, then the relative 
speedup is reduced to 2.49. This is because the single- 
processor solver has fewer computations to do and, thus, runs 
faster. 

Concluding Remarks 

An approach to implementing a block tridiagonal matrix 
solver on a shared-memory parallel processor has been 
demonstrated. It should be possible to run the PASCAL 
programs for the one-, two-, and three-processor solvers on 
other shared-memory parallel processors, if the I/O and 
synchronization procedures are reproduced on the target 
system. The same approach can also be extended to more 
processors if they are available. 

The results presented here are only a small part of the 
potential research that can be done on parallel processing of 
matrix solvers and solution of partial differential equations in 
general. Alternative architectures exist which have the potential 


for providing extremely fast matrix solutions. Architectures 
incorporating multiple array or vector processors, such as the 
ALLIANT FX/8 or CRAY X-MP, are examples. A pipelined 
math unit can perform operations much faster than the 
nonpipelined units found in typical microcomputers and main- 
frames. The key to tapping the potential of these architectures 
is the identification of at least two levels of parallelism in a 
given problem. The first is the operation level, which 
corresponds to the vectorization process done for single vector 
processors. The second is the vector operation level. Par- 
allelism here consists of multiple vector operations which can 
be done concurrently. 

Another high-potential research area is the investigation of 
alternative algorithms, given an architecture which can exploit 
the natural parallelism in the algorithm. There are many highly 
parallel iterative algorithms for solving systems of equations. 
Among these are successive overrelaxation methods (SOR) 
and conjugate gradient methods. Given an appropriate 
architecture, these methods could potentially yield higher 
performance than the Gaussian elimination method. 

The selection of an appropriate algorithm for solving any 
problem on a parallel processor is a function of many 
parameters. NASA Lewis Research Center is currently 
constructing a hypercluster system to provide a test bed for 
investigating architecture and algorithm interactions (ref. 10). 
The combination of multiple vector and scalar processors in 
a flexible interconnection scheme will allow a wide variety 
of architectural concepts to be studied. It is hoped that future 
work using the hypercluster will answer some of the questions 
regarding appropriate architecture and algorithm combinations 
for both computational fluid mechanics and computational 
structural mechanics problems. 


Lewis Research Center 

National Aeronautics and Space Administration 
Cleveland, Ohio, November 17, 1988 
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Appendix - PASCAL Program Listings 


Single-Processor Block Tridiagonal Solver 


1< 

0) 

0 ) — 

PROGRAM SOLVE? 


2< 

0) 

0)-- 



3( 

0) 

0 ) — 

TYPE 


4< 

()> 

0) — 



5< 

0) 

0) — 

RVECT * ARRAY Cl* *4 

*1**321 OF REAL* 

6< 

0) 

0) — 

AM A T= ARRAY 111**4* 

1 * * 4 * 1 * * 32 *1**31 OF 

7 ( 

0> 

())—« 



SC 

0> 

0 ) — 

VAR 


9 ( 

0) 

0 ) — 



10 < 

-12288) 

())-- 

ArB 

5 AMAT 5 

11< 

-13824) 

())“" 

FrCrDU 

: RVECT 5 

12 < 

-13832 ) 

0> — 

BPrBR 

t REAL ? 

13 C 

-13856) 

0 ) — 

Nrlf J* IB* IP* 11 

t integer; 

14 < 

-13876) 

())— 

12*13*111 *K*J1 

t INTEGER* 

15 < 

-13892) 

0) — 

IX* JBrIBIrlBI 

5 INTEGER* 

16 < 

-13892) 

())— 



17< 

0) 

1>— 

PROCEDURE GETINF ( 

VAR ADDR 5 RVECT* i 

18 < 

0) 

1 ) — 



19 C 

0) 

1)“ 

PROCEDURE IDATA ( 

VAR MATRXA 5 AMAT ? 

20 < 

0) 

1>— 

PROCEDURE IDATF < 

VAR MATRXF 5 RVECT 

21 < 

0) 

1>— 




JMEL : INTEGER >$ FOR WARD* 

VNUM 5 INTEGER >5 FORWARD? 
VNUM 2 : INTEGER >? FORWARD ? 


**** I DATE 
**** IDATA 
**** GETINF 


ASSUMED EXTERNAL. 
ASSUMED EXTERNAL 
ASSUMED EXTERNAL 


22 

1 

0 ) A— 

BEGIN 

23 

2 

())-- 

ill : = 15 

24 

3 

0 ) — 

125=2? 

25 

4 

0) — 

135=3* . , 

26 

5 

0) — 

N 5=30 * 

27 


0) — 


28 

6 

0) — 

IDATA ( B* 1536 )? 

29 

7 

0 ) — 

IDATF < C* 128 >? 

30 


0> — 


31 

8 

0 > — 

FOR IB 5= 1 TO N DO 

32 

9 

0 > — 

FOR IP 5= 1 TO 4 DO 

33 


0 ) B- 

BEGIN 

34 

10 

0) — 

BP 5* BC IP* IP* IB *12 1* 

35 

11 

0 ) — 

FOR j:= ip to 4 DC) 

36 

12 

0> — 

BC IP rJ, IB *12 2 {* BC IP*J*IB.I2 2 / BP* 

37 

13 

0) — 

FOR Ji = 1 TO ‘\ DO 

38 

14 

0) — 

BC IP *J *18*13 15= BC IP *J* IB* 13 1 / BP* 

39 

15 

0) — 

CC IP *18 15= CC IP* IB 1 /BP * 

40 

16 

0 ) — 

IF IP <> 4 THEN 

41 


0>C- 

BEGIN 

42 

17 

0) — 

.115= IP + 1? 

43 

18 

0) — 

FOR 15= 11 TO 4 DO 

44 


0 ) D— 

BEGIN 

45 

19 

0 ) — 

BR 5 = BC I* IP* IB *12 1? 

46 

20 

0) — 

FOR J*= IP TO 4 DO 

47 

21 

0) — 

BC X * J * IB * 12 15= BC Iv Jr IB r 12 1 - BR * BC 

48 

22 

0) — 

FOR J 5= 1 TO 4 DO 

49 

23 

0 )— — 

BC Ir Jr IB r 13 15= BC I * J * IB * 13 1 - BR * BC 

50 

24 

0> — 

CC I * IB 15= CC I * IB 1 - BR * CC IP* IB 35 

51 


0 ) - D 

END ? C FOR 11 

52 


0)-C 

END? CIF IP 3 

53 

25 

0) — 

IF IB <> N THEN 

54 


0)C- 

BEGIN 

55 

26 

0) — 

FOR 15= 1 TO 4 DO 

56 


0 ) D- 

BEGIN 

57 

27 

0) — 

IB1 5= IB + 15 

58 

28 

0) — 

BR 5 = BC I* IP *181*1111? 

59 

29 

0 ) — 

FOR J 5= IP TO 4 DO 

60 

30 

0 ) — 

BC IrJrlBlrlll 15= BC 1*0*181*111 1 - BR ! 

61 

31 

0 ) — 

FOR J5= 1 TO 4 DO 

62 

32 

0> — 

BC IrJrXBlrI2 15= BC IrJrIBlrI2 3 - BR * 1 

63 

33 

0) — 

CC I*IB1 15= CC IrlBl 1 - BR * CC IPrIB 1? 

64 


0 )~D 

END? CFOR 11 

65 


0)-C 

END? C IF IB 3 

66 


0 )— B 

END* C FOR IP 3 


IP*J*IB*I3 1* 


* BC IP* J* IB *12 35 


BC IP* J* IB *13 35 


11 


1 TO N DO 


67 

68 
69 
7 0 
71 
77 . 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 


34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 


47 

48 


0 > — FOR IBi:*: 

0)8- BEGIN 

0) — IBJ» N + 1 - ibi? 

0)"" IBI *" IB + i? 

0)~ FOR II := 1 TO 4 DO 

0)0 BEGIN 

o — it* 5 - n? 

o)-“ duc i r :i:b at* -i*o * cr i»ib 3? 

0)— IF I <> 4 THEN 

0 ) D“ BEGIN 

0)— Jit- 1 + 15 

0)~" FOR Jt~ J1 TO 4 DO 

0)"" DUC I* IB 3t* DUC I » IE:! :i - 

0)-D END 5 C IF I <> 4 3 

0 )--•-■ IF IB <> N THEN 

0 ) D- BEGIN 

0)"" FOR J:== 1 TO 4 DO 

())"" DUC If IB 3t^ DUC If IB 3 * 

0 ) ™D END 5 C IF IB <> N 3 

0 ) -C END 5 C FOR II 3 

0)"B END? C FOR IBI 3 

0) Kt*12B5 

0) — GET INF < DU y K )? 

0 ) —A END* 


BC I* J? IB » 12 I * DUC JfIB 3? 


BC If JfIB f 13 3 *: DUC JfIBl 3? 


xxx* NO ERROR < S ) AND NO WARNING (S) DETECTED 
**** 90 LINES 3 PROCEDURES 
xx** 944 PCODE INSTRUCTIONS 


Dual-Processor Block Tridiagonal Solver; Serial Back Substitution, Processor 1 


1< 

()> 

0) — 

PROGRAM SOLVE? 


2 C 

0) 

0 ) —• 



3< 

0> 

0 ) — 

TYPE 


4 ( 

0> 

0) - 



5< 

0) 

0 > — 

RVECT -ARRAY C 1 * * 4 

f 1**323 OF REAL5 

6< 

0) 

0 ) — 

A MAT- : ARRAY Cl**4f 

1 * * 4 f 1 * * 32 f 1 * * 3 3 OF REAL? 

7 ( 

0> 

0)"" 



8< 

0> 

0 ) — 

VAR 


9 ( 

0) 

0 ) — 



IOC 

-6144) 

0 ) -••• 

B 

X AMAT ? 

1 1C 

-7168) 

0)"~ 

C r DU 

: RVECT? 

12 < 

-7170) 

())"" 

SYNC1 r SYNC2 

: BOOLEAN? 

13 < 

-7170) 

0 ) — 



14 < 

0) 

D” 

PROCEDURE GET INF < 

VAR ADDRt RVECT? NUMELt INTEGER )? FORWARD? 

15 < 

0) 

1) — 



16 < 

0) 

i>-~ 

PROCEDURE PUTIN! ( 

IVALt INTEGER 5 VAR IPTRt INTEGER )5 FORWARD? 

17 ( 

0) 

1 ) — 



18 C 

0) 

l ) — 

PROCEDURE ID AT A ( 

VAR MATRIXA : AMAT » VCNT J INTEGER ) * FORWARD? 

19 < 

0) 

1 ) ~ 



20 C 

0> 

l)” 

PROCEDURE IDATF ( 

VAR MATRIXF X RVECT? VCNT1 X INTEGER )? FORWARD? 

21 ( 

0) 

l)"" 



22 C 

0) 

l ) -- 

PROCEDURE COPROC 

? 

23 ( 

0) 

l ) — 



24 ( 

0) 

I) — 

CONST 


25 C 

0) 

1 ) — 



26 ( 

0) 

l ) — 

CMAX—100 00 00 5 


27 ( 

D> 

:l ) — 

VAR 


28 C 

0) 

l ) — 



29 ( 

-6144) 

l) 

A 

J AMAT ? 

30 < 

-6656) 

:l ) — 

F 

t RVECT? 

31 < 

"6664 ) 

l ) - - 

BPfBR 

X REAL? 

32 C 

-6688) 

I) — 

Nflr JvIBfIPfll 

i INTEGER? 

33 C 

-6708) 

l ) —■ 

I2fI3fIIlfKf J1 

t INTEGER? 

34 C 

-6724 ) 

l ) — 

Ilf JBr IB If IBI 

5 INTEGER? 

35 ( 

-6736 ) 

l ) 

ERRfSCNTIfIPTR 

t INTEGER? 

36 ( 

-6736 ) 

l ) 



37 ( 

-6736 ) 

i ) — - 



38 

1 

1 ) A- 

BEGIN 


39 

2 

1) — 

SYNC1 * -FALSE ? 



12 


40 

3 

1 ) — 

IPTRS»0? 

41 

4 

1 ) — 

SCNT1S*0? 

42 

5 

1) — 

ERRS-0? 

43 

6 

1 > — 

lilt* 1> 

44 

7 

:i > — 

125*2? 

45 

8 

i ) — 

X3 S*3? 

46 

9 

i ) — 

ns*3o? 

47 


i 


48 

10 

l ) — 

I DATA ( B r 1536 )? 

49 

11 

i) — 

XDATF< CD. 23 ) 5 

50 


i ) — 


51 

12 

l ) — 

SYNC1S*TRUE? 

52 


i ) -- 


53 

13 

i ) — 

FOR IBS- 1 TO N DO 

54 

14 

i ) — 

FOR IPS* 1 TO 4 DO 

55 


1)8- 

BEGIN 

56 


1 ) — 


57 

15 

1 ) — 

8CNT1 S«0 ? 

58 


DC- 

REPEAT C SYNCHRONIZE WITH PROCESSOR 2 3 

59 

16 

1 ) — 

SCNT1 S*8CNT1+1 ? 

60 

17 

1) — 

IF SCNT1 > CMAX THEN 

61 


DO- 

BEGIN 

62 

18 

i>— 

sync2s»true; 

63 

19 

1 ) 

ERRS«ERR + 1? 

64 


1 ) -D 

END ? 

65 

20 

1)-C 

UNTIL SYNC2 ? 

66 

21 

1 ) — 

SYNC2S* false; 

67 


1 ) — 


68 

77. 

1)-- 

BPS = BC IP, IP, IB, 12 3? 

69 

23 

1 ) — 

JS* IP? 

70 

24 

1 ) • — 

WHILE J O 4 DO 

7:l 


DC- 

BEGIN 

72 

25 

1 > — 

BC IP, J, IB, 12 3i» BC IP, J, IB, 12 3 / BP 5 

73 

26 

1 ) — 

JS*J+2? 

74 


1 ) -C 

END? 

75 

27 

1) — 

JJ* 1? 

76 

28 

1 ) — 

WHILE J <» 4 OO 

77 


1 ) c- 

BEGIN 

78 

29 

1>— 

BC IPfJfXBfI3 3S* BC IPfJfIBfI3 3 / BP? 

79 

30 

D— 

JJ* J + 2? 

80 


DC 

END ? 

81 

31 

1>~ 

CC IPrIB 3S* CC IP r IB 3 /BP ? 

82 


1 ) — 


83 


1 ) — 


84 

32 

1 ) — 

IF IP <> 4 THEN 

85 


1 ) c- 

BEGIN 

86 

33 

1>— 

IIS* IP + 1? 

87 

34 

D— 

FOR It* 11 TO 4 DO 

88 


1)0- 

BEGIN 

89 

35 

1)~ 

BR5* BC I r IP r IB r 12 3? 

90 

36 

1 ) — 

JS* IP? 

91 

37 

1 ) — 

WHILE J <« 4 00 

92 


1 ) E- 

BEGIN 

93 

38 

D — 

BC IfJfIBfI2 3S* BC IfJfIBfI2. II - BR * BC IP * J t IB v 12 Tt 

94 

39 

1> — 

JS* J + 2? 

95 


1 ) — E 

END? 

96 

40 

1 ) — 

JS* 1? 

97 

41 

D — 

WHILE J <* 4 DO 

98 


1 ) E— 

BEGIN 

99 

42 

D — 

BC IfJfIBfI3 35* BC IfJfIBfI3 3 - BR * BC IPfJfIBfI3 3? 

100 

43 

D — 

JS* J + 2? 

101 


1)-E 

END? 

102 

44 

1 ) -- 

CC If IB 3S» CC If IB 3 - BR * CC IPfIB 3? 

103 


1)-D 

END? C FOR 13 

104 


1 ) -C 

END? C IF IP 3 

105 

45 

D~ 

IF IB <> N THEN 

106 


DC~ 

BEGIN 

107 

46 

D-- 

FOR IS* 1 TO 4 DO 

108 


1)0- 

BEGIN 

109 

47 

D — 

IBIS* IB + 1? 

110 

48 

1 ) — 

BRS* BC I f IP f IB1 f III 3 ? 

111 

49 

D — 

JS* IP? 

112 

50 

D — 

WHILE J <- 4 DO 

113 


DE- 

BEGIN 

114 

51 

D — 

BC I f J f IB 1 f II 1 3S» BC IfJfIBIfIII II - BR * BC IPf Jf IBf 12 

115 

52 

D — 

JS* J + 2? 


13 


1.16 

117 

118 

119 

120 
121 
122 

123 

124 

125 
12 6 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 

156 

157 

158 

159 

160 
161 
162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

KXKK 

XXX* 

MOKXOK 

XXX* 

173 

174 

175 


IDATF 
I DAT A 
PUTIN! 
GETINF 


53 

54 

55 

56 

57 


58 


59 

60 
61 

62 

63 

64 


66 

67 

68 

69 

70 

71 

72 

73 

74 

75 


76 

77 

78 

79 

80 


81 

82 

83 

34 


1 > “E 
1 ) — 
1 ) — 
1 ) E- 
l)-~ 
1 ) — 
1 ) -E 
1 ) — 
1 ) “D 
1)-C 
1 > — 
1 ) • -B 
1 )-— 
1 ) — 
1)B- 
1 ) — 
D — 
1 ) C— 
1 ) — 
1 ) 

1 ) -C 
1)~B 
D — 
1 ) — 
1)B- 
:i .)-— 
i)~ 
l>-~ 
1)0 
1 ) — 
1 ) ...... 

1 ) — 
1 ) f>— 
1 ) — 
1> — 
1 ) E- 
1 ) — 


-E 

■'•[) 


-E 

0 

-C 


end; 
j:= if 

WHILE J <« 4 DO 
BEGIN 

BC I * J * IB1 1 12 3t« BC 1*0* IB 1 y 12 3 - BR * BC IP*J*XB*I3 3* 
j + 2 f 

ENOf 

CC IflBl 3t« CC XfIBI J - BR x CC IP * IB 3f 
end; cfor i3 
end; c if ib 3 
synci {-true; 
end; i: for ip 3 


scnti:==o; 

REPEAT 

SCNT1 ♦ -SCNT 1 + 1 ? 

IF SCNT1 > CM AX THEN 
BEGIN 

SYNC2t«TRUEf 

frr:== err it 

END ; 

UNTIL. SYNCS? t 


C SYNCHRONIZE WITH PROCESSOR 


FOR TBit 
BEGIN 
IBJ- 


1 TO N DO 


1 

1? 

1 TO 4 


IB 1 1 


1) 

1) 

1) 

1 ) - 
1)0 
1 )- 
1)~ 

1)E 
1 )- 
1)~ 

1) 

1) 

1) 

1) 

1) 

1 > 

1) 

1) 

1) 

1 ) - - 

ASSUMED EXTERNAL 
ASSUMED EXTERNAL 
ASSUMED EXTERNAL. 
ASSUMED EXTERNAL 
85 0 ) A- BEGIN 

36 ())-■-• coproc; 

0 ) -A END. 


N + 

ibi:« IB 
FOR II :« 

BEGIN 

it- 5 - 11 ; 

DlC I* IB 3S“ -I 
IF I <> 4 THEN 
BEGIN 
j : - 1 + 
WHILE J 
BEGIN 
DUC 
•J ♦ - 
END ; 

end; i: if 


DO 


If 


0 * cr; 1 9 ib it 


4 DO 


I* IB 31* 

j + 1 ; 


DUC I* IB 3 - BC If JfIBfIZ 3 x DUC JfIB 3f 


i <> 4 :i 


N THEN 


IF IB 
BEGIN 

j:« i; 

WHILE J 
BEGIN 

DUC If IB 

j:«j + i 
end; 

end; i: if ib < 
end; c for II 3 
end t r: for xai :.i 

PUTINT < ERRfIPTR > t 
PUTINT < SCNT 1 f IPTR ) 5 

ks«i2b; 

GETINF < DU v K )? 

■a end; c coproc ::i 


4 DO 


3 X- DUC If IB 3 -■ BC If JfIB f 13 3 x DUC J»IB1 3S 


N 3 


**** NO ERROR < S ) AND NO WARNING < S ) DETECTED 
**** 175 LINES 5 PROCEDURES 
**** 1076 PC ODE INSTRUCTIONS 


14 



Dual-Processor Block Tridiagonal Solver; Serial Back Substitution, Processor 2 


:i.< 

0> 

0) — 

PROGRAM SOLVE ? 


2< 

0) 

0) — 



3< 

to 

0 ) — 

TYPE 


4< 

0) 

0) — 



5( 

0) 

0>“ 

RVECT* ARRAY C! 1 * ♦ 4 » X 

••323 OF REAL? 

6 < 

0) 

0 ) — 

A MAT- ARRAY Cl»*4fl« 

♦ 4 r 1 ♦ * 32 r 1 * ♦ 3 3 OF REAL f 

7 < 

0) 

0>- 



ec 

0) 

0 ) — 

VAR 


9C 

0> 

0 ) 



:io < 

-6144) 

0 ) — 

8 : 

amat; 

n< 

-7168) 

0 ) — 

Or du ; 

RVECT5 

12 < 

-7170) 

0) — 

SYNC1 fSYNC2 i 

BOOLEAN? 

13 < 

-7170) 

0 ) — 



14 < 

0) 

1) — 

PROCEDURE S FIT AS ( I0FF3T X INTEGER ) f FORWARD? 

lb ( 

0) 

1) — 



16< 
17 ( 

0) 

0> 

1>~ 
1) — 

PROCEDURE F'UTXNT < IVAR (INTEGER ! VAR IPTRt INTEGER > i FORWARD 

18 < 

0) 

1)— 

PROCEDURE COPROC f 


19 ( 

0> 

1>— 



20 < 

0) 

1>— 

CONST 


21 ( 

0) 

1)-- 



22 < 

0) 

1>— 

cmax-ioooooo; 


23 < 

0) 

1>— 



24 < 

0> 

1)~ 

VAR 


2b ( 

0) 

1> — 



26 < 

-6144 ) 

i>~ 

A 

amat; 

27 ( 

-6656 ) 

1)“ 

F r 

RVECTf 

28 ( 

-6664 ) 

1>~ 

BPfBR 

REAL? 

29 ( 

-6688) 

1)~ 

NfXf JfIBfIPfll 

INTEGER? 

30 C 

-6708) 

1 ) — 

12 f 13 fill fKf J1 

INTEGER? 

31 < 

-6724 > 

1 ) — 

Ilf JBfIBIfIBI 

INTEGER? 

32 < 

-67 36) 

1) - 

SCNT2fERRfIPTR 

INTEGER? 

33 < 

-6736 ) 

1 ) — 



34 

1 

1)A- 

BEGIN 


35 

2 

1) — 

SET AS < 16*300000 ) $ 

36 

3 

1> — 

SYNC2 ♦ -TRUE ? 


37 

4 

1> — 

iptrs*o? 


38 

5 

1> — 

SCNT2S»0? 


39 

6 

1 ) — 

err:«o? 


40 

7 

1>~ 

IIS »2? 


41 

8 

1 ) — 

Ill *-l ? 


42 

9 

1) — 

I2S«2? 


43 

10 

D — 

I3I«3? 


44 

11 

1> — 

NS *3 Of 


4b 

12 

1) — 

k:«12b; 


46 


1>~ 



47 


1) — 



48 

13 

1> — 

FOR IBS* 1 TO N DO 

49 

14 

1 ) — 

FOR IPS* 1 TO 4 

DO 

50 


1)8- 

BEGIN 


51 


1)-- 



52 

15 

1 ) — 

SCNT2S«0; 


53 


1 > C- 

REPEAT 

C SYNCHRONIZE WITH PROCESSOR 

54 

16 

1 ) — 

SCNT2 J *80 NT 2+ 1 ? 

55 

17 

1 ) — 

IF 8CNT2 > 

CMAX THEN 

56 


1 ) D— 

BEGIN 


5/ 

18 

1 ) — 

SYNC1 : 

-TRUE ? 

58 

19 

1) — 

ERR * — 

ERR + If 

59 


1)-D 

END ? 


60 

20 

1 ) -C 

UNTIL SYNC1? 


61 

21 

1 > — 

SYNC1S* false; 

62 


1 ) — 



63 

22 

1 ) — 

BPS* Bi: IPfIPfXBfI2 i? 

64 

23 

1>~ 

JS* IP + If 


65 

24 

1) — 

WHILE J <* 4 DO 

66 


1>C- 

BEGIN 


67 

25 

1) — 

BC IPfJf 

IB f 12 3S== BC IPfJfIBfI2 3 / BP? 

68 

26 

1) — 

J » « J+2 5 



4 DO 


69 

70 

71 

72 

73 

74 

75 

76 

77 
70 
79 
BO 
8:l. 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 
101 
1 02 

103 

104 

105 

106 

107 

108 

109 

1 10 
111 
1 12 

113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

XttKK 

125 

126 
127 


1 > -C 

27 1 ) — 

28 1) — 

DC- 

29 1 ) — 

30 1 ) — 

1 ) "C 
1> — 

1) — 

31 i ) •• — IF IP <: 

1)C- BEGIN 

32 1 ) — II * 

33 1 ) 

1 ) D- 

34 1> — 

35 1>~ 

36 D — 

J. ) E— 

37 1) — 

38 1 ) — 

1)-E 

39 1 ) — 

40 1) 

1>E- 

41 1 ) — 

42 1 ) — 

1)-E 
1 ) -D 

1)-C END 

43 1)— IF IB < 

1)0 BEGIN 

44 1) — FOR 

1 ) l>~ 

45 1 ) — 

46 1 ) — 

47 1 > — 

48 1 > - 

1 ) E- 

49 1 ) • — 

50 1 ) — 

1 ) -E 

51 D 

52 1)~ 

1>E- 

53 1 > — 

54 1) — 

1 ) -E 
1 ) -D 
1) -C 

55 1 ) — 

1 ) -B 
1>~ 

56 1 ) — 

57 1) — 

1 ) -A 
1) — 

PUTINT ASSUMED EXTERNAL 
SET AS ASSUMED EXTERNAL 

58 0 ) A- BEGIN 

59 0)-- CGPRQCJ 
0 ) —A END ♦ 


END ; 

Jt» 2? 

WHILE J <““ 

BEGIN 

b \: ip r jr ib » 13 :u s 
,j:= j + 2? 

END ; 


4 THEN 


BC IPrJrIBrI3 3 / BP 5 


n; 


IP + If 

FOR IP-= II TO 4 DC) 

BEGIN 

BRi« BC IrIP»IBrI2 
IP + 15 

WHILE 3 <* 4 DO 
BEGIN 

BC If J»IB»I2 1 \= BC IfJrIBfI2 II - BR * BC IPrJrIBrI2 Ilf 
j:= J + 2 f 
END f 
2 f 

WHILE J <= 4 DC) 

BEGIN 

BIT. IfJiIBfI3 3S» BC I r Jr IB * 13 II - BR * BC IPf JfIBrI3 II? 
JJ* J + 2f 

end; 

ENDf E FOR 
LI IF IP 3 
THEN 


13 


In! 


1 TO 4 DO 


IB 


1 5 

I v IP * IlrTl 5» III 3 f 
D 

:> 4 DO 


:i : 


bc ifjfiBirin ::i 


I! 

BEGIN 
IBID: 

BRJ* BE 

j:= ip + 

WHILE J 
BEGIN 

BC IfJfIBIflll 
JJ» J + 2 f 
END f 
JJ* 2 f 

WHILE J <« 4 DO 
BEGIN 

BC IfJfIBlfI2 3J* BC IfJ»IBlfI2 II 

j:* j 2 f 

END f 

ENDf CFOR III 
ENDf C IF IB 3 
SYNC2 t -TRUE f 
ENDf II FOR IP 3 

PUTINT ( ERRfIPTR >f 
PUTINT < SCNT2fIPTR )f 
ENDf II COPROC 3 


BR * BC IP f Jr IB f 12 35 


BR * BC IPf Jr IB r 13 35 


xxxx NO ERROR < S ) AND NO WARNING <S) DETECTED 
xxxx 127 LINES 3 PROCEDURES 
xxxx 717 PCODE INSTRUCTIONS 


Dual-Processor Block Tridiagonal Solver; Column Sweep Serial Back Substitution, Processor 1 


:t.< 

CD 

o ) 

• PROGRAM SOLVE ► 

2 < 

0) 

0 ) 


3 < 

0) 

i:i ) 

• TYPE 


original page is 

Of POOR QUALITY 



ORIGINAL PAGE IS 
OF POOR QUALITY 


M 

5< 

6< 

7 C 

8 ( 

9< 

ioc 

11 < 

12 ( 

13 < 

14 < 

15 ( 

16 < 

17 ( 

18 < 

19 ( 

20 ( 
21 < 

22 C 

23 < 

24 ( 

25 < 

2.6 ( 

27 < 

28 < 

29 ( 

30 ( 
3K 
32. < 

33 ( 

34 < 

35 < 

36 C 

37 C 

38 C 

39 < 
40< 

41 

42 

43 

44 
'♦5 
46 
'♦7 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 


0 ) 

0 ) 

0 > 

0 ) 

0 > 

0) 

*-6144) 
- 7 1 68 > 
-7170 ) 
-7X70 > 
0) 
0) 
0) 
0> 
0) 
0 > 
0> 
0 > 
0) 
0> 
0 ) 
0 ) 
0 > 
0> 
0 > 
0) 
CO 
CO 

~6.m> 

—6656 ) 
- 66 6 ' 4 ) 
- 6688 > 
-6708) 
-672/4 ) 
-6736) 
-6736) 
-6736) 
1 
2 
3 
*4 

5 

6 

7 

8 
9 


10 

11 

12 

13 

1 * 


15 


0 ) 

0 ) 
0 )~ 

0 ) — 

0 ) — 
0 ) — 
0 ) 

0 ) — 
to— 
0 ) — 
1 ) — 
1 ) — 
1 > — 
l ) — 
i>~ 
1 ) — 
1 ) — 
1 )~ 
:L) — 

1 ) - 

1 ) — 

1 ) 

1 ) —■ 
1 ) — 
X ) — 

X ) - 

1 ) — 
1 ) — 
1 ) — 
1 ) 

D — 

X ) 

1 ) — 

1 ) 

1 ) - 
1 ) — 
D- 
1 ) A 
1 ) - 

1 >- 
1 ) - 
1 )“ 
1 ) •” 
1 > - 
1 ) - 
t>~ 

x>- 


rve:ct»=array c: i. . . 't . • 32 :i of . 

AMAT™ ARRAY Cl . !• «32fl. .-3.J 


VAR 


B 

C » DU 

SYNC! t SYNCZ 


AMAT » 
RVECT ? 
boolean; 


PROCEDURE GETINF < VAR ADDR? RVECT? NUMELt INTEGER ) # FORWARD ? 

PROCEDURE F'UTINT < IVAL: INTEGER? VAR IPTR t INTEGER > I FORWARD? 

PROCEDURE 8YNCR0< VAR EFI_G»CNTS INTEGER? NAXCNT? INTEGER? VAR 8FL.G * BOOLEAN >? 
FORWARD i 


PROCEDURE IDATA< VAR MATRIXA 
PROCEDURE IDATFt VAR MATRIXF 
PROCEDURE COPROC f 
CONST 


AMAT ? VC NT 5 INTEGER )? FORWARD; 
RVECT? VCNT X : INTEGER )f FORWARD * 


CMAX-1000000 * 
VAR 

A 

F 

BPfBR 

N»I»JrIBFlPrIl 
X2 9 13 t IX 1 rKr <J 1 
II.JBfXBlrlBI 
ERR rSCNTlr IPTR 


AMAT ; 
RVECT ; 
REAL? 

integer; 

integer; 

integer; 

integer; 


i ) — 

i > -- 
D — 
1 ) — 
D- 

X ) 

1 ) B’ 
1 )- 

1 ) 

1 )“ 


BEGIN 

synci; -false; 
iptr:«o; 
scnti:»o; 
err t -o ; 
in t« i; 

125-2; 

135 - 3 ; 

n:- 30? 

ID AT A < Bf 1536 )f 
IDATF ( 0128 >; 

SYNC it - true; 

FOR IB 5 * 1 TO N DO 
FOR IP 5- 1 TO "4 DO 
BEGIN 


SYNCRO < ERR»8CNTlfCMAX»8YNC2 > * 


SCNTlt«0; 





i) — r. 

REPEAT' 


1>— c 

SCNT15-SC 


1>“ 17. 

IF SC NT 1 


1 ) — 17 

BEG IN 


i) — r;. 

SYNCZ 


1 ) — 17 

ERR t -1 


1) — 17. 

end; 


i > — i: 

UNTIL SYNCZ 

16 

i) -- 

SYNCZ 5- FAU 


:l ) — 


17 

1 ) — • 

BP 5 - BC IB y 

18 

1) — 

J5 - ip; 

19 

1 ) — 

WHILE J <- 


1 ) C~ 

BEGIN 

20 

1 ) — 

Bt IPfJ 

21 

1 ) — 

j:*j+ 2; 


1 ) -C 

end; 

22 

1 ) — 

J5- if 

23 

1 ) — 

while j <« 


t: synchronize with processor 

rx+x ; 

CMAX THEN 7.1 

-true; p 

rr + i ; 

:i 

i 


*4 DO 


'4 DO 


Bt iPyJyIByIz :j / bp; 


17 



81 


1)0 

BEGIN 

82 

24 

1 > — 

BC XP'JtIB»I3 II t== BC IP»JvXBrI3 3 / BPS 

83 

25 

1 ) — 

ji« j + 2 ; 

84 


n-c 

END * 

85 

26 

i ) — 

cc ipfIb :k- cc xpfXb 3 /bp> 

86 


n— 


87 


i ) — 


88 

27 

i>-~ 

IF IP <> 4 THEN 

89 


no 

BEGIN 

90 

28 

i > -- 

Ili = IP + If 

91 

29 

i ) 

FOR Its 11 TO 4 DO 

92 


i > i>- 

BEGIN 

93 

30 

i ) — 

BRS* BC I r IF-’ f IB r 12 3? 

94 

31 

1 ) — 

J t ” IPS 

95 

32 

1) — ■ 

WHILE J <- 4 DO 

96 


1>E- 

BEGIN 

97 

33 

1) — 

BC 1 v J ? IB f 12 35* BC IfJ»IBfI2 :i ““ 

98 

34 

1) — 

j:== j + 2? 

99 


1)-E 

END * 

100 

35 

1 ) — 

Ji« If 

101 

36 

1)-- 

WHIL.tr. J <« 4 00 

102 


1 ) E-- 

BEGIN 

103 

37 

1 ) — 

BC I»JfIB»I 3 3S» BC I. y Jr IB >13 II - 

104 

38 

1)~ 

jt« j + 2 ; 

105 


1 ) — E 

END t 

106 

39 

1 ) — 

cc i * ib :i:= cc ifib :i - br * cc ip»ib 

107 


1)-D 

end; r for i:i 

108 


1 ) -C 

end; cif ip 3 

109 

40 

1) — 

IF IB <> N THEN 

no 


1 ) C- 

BEGIN 

in 

41 

1 ) — 

FOR Its 1 TO 4 DO 

112 


1 ) D- 

BEGIN 

113 

42 

1 ) — 

IB1 *'-■ IB + If 

114 

43 

1) — 

BR{“ BC 1 1 IP"’ y IB 1 » IT. 1. 3 f 

115 

44 

1 ) — 

j:« ipf 

116 

45 

l)-~- 

WHILE J <~ 4 DO 

117 


1 ) E- 

BEGIN 

118 

46 

1> — 

BC I y Jy IB1 * III 3i» BC I f J r 191 » III 

119 

47 

1 ) — 

j:== j + 2 ; 

120 


1)HE 

END ; 

121 

48 

1 ) 

j:« i? 

122 

49 

1 ) -- 

WHILE J <= 4 DO 

123 


1)10 

BEGIN 

124 

50 

1> — 

BC If..JfIB1fX 2 :it== BC Iy,iyIBlvI2 3 

125 

51 

1 ) — 

Jt== j + 2 ; 

126 


1)-E 

end ; 

127 

52 

1 ) — 

CC 1 9 191 3i- CC IrlBl 3 - BR * CC IP*: 

128 


1 ) “D 

end; cfor i:j 

129 


1)-C 

end; c if IB3 

130 

53 

1 ) — 

SYNC 1 t -TRUE ; 

131 


1)H5 

END? C FOR IP 3 

132 


1 ) 


133 

54 

1)~ 

8 YNCRO < ERR fSCNTI »CMAX y SYNC2 >f 

134 

55 

1 ) — 

sync2:» false; 

135 


1) — 

C SCNTlt«Of :i 

136 


1) — 

i: REPEAT ::i C SYNCHRONIZE WITH PROCESSOR 

137 


1)“ 

C SCNTli»SCNTl+lf :i 

138 


1 ) -- 

].: IF SCNT1 > CM AX THEN 3 

139 


1 ) — 

C BEGIN 3 

140 


l)---- 

C SYNC2t -TRUES 3 

141 


1) — 

t: ERR is ERR + If 3 

142 


n— 

C END ; 3 

143 


n-— 

t: UNTIL SYNC2 1 3 

144 


i ) — 


145 


l ) — 

[.’COLUMN SWEEP BACKSUBSTITUTION ALGORITHM.’] 

146 

56 

:i. ) — 

FOR IB J* 1 TO N DO 

147 


1)9- 

BEGIN 

148 

57 

1 ) 

it «i; 

149 

58 

1 > — 

WHILE I <* 4 DO 

150 


1 ) O 

BEGIN 

151 

59 

1> — 

CCIfIB3is -GO * CCIfIB3i 

152 

60 

1) — 

i:«i + 2; 

153 


1 )~C 

END ; 

154 


:l.)-B 

end; 

155 

61 

1 ) 

SYNC Untrue; 

156 

62 

1 ) — 

S YNCRO ( ERR v SCNT 1 r CM AX y SYNC2 )? 

157 

63 

1)— 

SYNC2i -FALSE f 
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1 TO N DO 


i 

J 158 

159 

160 
161 
162 

163 

164 

165 
1.66 

1 67 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

180 
181 
182 
183 
1.84 

185 

186 

187 

188 

189 

190 

191 

192 

193 

194 

195 

196 

197 

198 

199 

200 
201 
202 

203 

204 

205 

206 

207 

208 

209 

210 
211 
212 

213 

214 

215 

216 

217 

218 

219 

220 
221 
222 

223 

224 

225 

226 

#*** IDATF 
**** IDATA 
**** SYNCRO 
**** PUTIN T 
**** GETINF 

227 

228 
229 


1) — 

64 1 > — FOR IBI 3 
1)8- BEGIN 

65 1)~ IBI* N + 1 — IBI I 

66 1 ) — IBI ♦ * IB - 1? 

67 1 ) — DUC 4 f IB 33* CC4*I B33 

68 1 > — FDR 13* 4 DOWN FD 1 DO 

1)0 BEGIN 

69 1>~ 113*1-1* 

70 1 ) — IF ( NOT ( <IB « 1) AND <1 » D) AND (I <> 1)> THEN 

1)D- BEGIN 

71 i) — j:=ii; 

72 1.) — WHILE J >« 1 DO 

1 ) E- BEGIN 

73 1) - CCJfXB 33« CCJfIB3 - BCJfIfIB»23 * DUE I f IB II ? 

74 1 ) — J$*J-2; 

1 ) — E ENDS 

1 ) -D END ? 

75 1)— IF IB <> 1 THEN 

1)D- BEGIN 

76 i>— j:»i* 

77 :l. ) — WHILE J <* 4 DO 

1)E- BEGIN 

78 :L) — CCJrIB133* CCJfIB13 - BC JfXfIBI f 33 He DUE I r IB 3 ? 

7 ? i) + 2; 

1)-E END? 

1>~D END? 

BO 1 ) -- SYNCRO ( ERR r SCNT1 > CMAX , SYNC2 > 5 

81 1 ) — SYNC2 3 -FALSE ? 

82 1 ) — DUC 1 1 * IB 3 J «CC 1 1 r IB 3 ? 

83 1) SYNC1 3*TRUE? 

1)-C END? 

1)-B END? 

1 ) — 

1>— C OLD BACKSUBSTITUTXON3 
1 > — C FOR IBI 3* 1 TO N DO 

1 ) — BEGIN 

1 ) — IBS* N + 1 - IBI * 

1>~ IBI 3* IB + 1? 

1 ) — FOR II 3* 1 TO 4 DO 

1)— BEGIN 

1)-- 13* 5 - II? 

1>— DUC If IB 33* -1*0 * CL I, IB II? 

1 ) — IF I <> 4 THEN 

1) — BEGIN 

1) JJ* I + 1? 

1)- WHILE J <* 4 DO 

1 ) — BEGIN 

1 ) — DUC If IB 33* DUC If IB 3 - BC IfJ,IBfI 2 3 * DDL Jv IB 3? 

1 ) — J3* J + 1? 

1)-- END? 

1 ) — END? 

1> — IF IB <> N THEN 

1)~~ BEGIN 

1)— J3« It 

1 > — WHILE J <* 4 DO 

1) — BEGIN 

1 ) — DUC If IB 3J* DUC If IB 3 - BC IfJ,IBfX 3 3 * DUC Jr IBI II? 

1 ) — J3*J + 1? 

1)— END? 

1)-- END? 

1)— END? 

1>— END? 3 

84 1 ) — PUTIN T< ERRfIPTR )? 

85 1)— PUTINT < GCNTIfXPTR )? 

86 1>- K 3 *1263 

87 1 ) — GETINF < DUfK >? 


1 ) -A END? L COPROC J 
1)~ 

ASSUMED EXTERNAL 
ASSUMED EXTERNAL 
ASSUMED EXTERNAL 
ASSUMED EXTERNAL 
ASSUMED EXTERNAL 

88 0 ) A- BEGIN 

89 0)— COPROC? 

0)-A END ♦ 
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**** NO ERROR<S> AND NO WARNING < S > DETECTED 
x*** 229 LINES 6 PROCEDURES 
x*** 1199 PC ODE INSTRUCTIONS 


Dual-Processor Block Tridiagonal Solver; Column Sweep Back Substitution, Processor 2 


K 
2 < 
3< 
4 C 
5< 
6 ( 
7 ( 
8< 
9 ( 
10 ( 
1 1 ( 
1.2 ( 

13 C 

14 < 


0> 

0)“ 

PROGRAM SOLVE; 


0) 

o ) - 




CD 

o ) - 

..... TYPE 


0> 

())•• 

.... 


0) 

0 ) 

R VEST -ARRAY l 1 * 4 

4*1*4 32 3 OF REAL? 

0> 

0 ) •• 

- AMAT- ARRAY Cl* *4 

, 1 * * 4 > 1 ♦ ♦ 32 *1*4 3 3 OF REAL P 

0) 

0 ) •• 

..... 


0> 

0>- 

- VAR 


CD 

0 ) - 

..... 


-6144) 

0) ■ 

B 

l AMAT ? 

-7168) 

())“ 

C t DU 

t rvect; 

-7170) 

0>- 

SYNC1 r SYNC2 

t boolean; 

-7170) 

0)- 




0) 

D- 

- PROCEDURE SET AS < 

IOFFST : INTEGER >? FORWARD 


15 < 

16 < 

17 < 

18 < 

19 < 

20 < 
21 ( 
22 < 

23 ( 

24 < 

25 < 

26 C 

27 < 

28 < 

29 < 

30 ( 
31 C 

32 < 

33 < 

34 < 

35 ( 

36 C 

37 

38 

39 

40 


0> 1 ) — 


0 ) 
()> 
0> 
0) 
0) 
0> 
0) 
0) 
0) 
0) 
0) 
CD 
0) 

-•6:1.44) 
-6656 > 
-6664 ) 
—6688 > 
-6708) 
-6724) 
-6736) 
-6736 ) 


1)— PROCEDURE PUTIN! < IV AR I INTEGER 5 VAR IPTR t INTEGER ) * FORWARD* 

1 ) 

1> PROCEDURE SYNCRCK VAR EFLGpCNTS INTEGER? MAXCNTS INTEGER? VAR SFLGS BOOLEAN ) 

i ) — forward; 

i ) 


1 ) PROCEDURE COPROC 

1> — 

1)~ CONST 

1>- 

1 ) — CMAX®1000000? 

1) - 

1) VAR 

1>~ 

1. ) - A 

1 > -- F 

1) BP * BR 

I ) - N r I v J y IB 9 IF' v 1 1 

1>— 12*13 rill rKtJl 

1 ) II. JBrIBIrlBI 

1 )- SCNT2 1 ERR r IPTR 

1 ) — 


AMAT ; 
RVECT5 
REAL ; 
INTEGER? 
INTEGER > 

integer; 

integer; 


1 1)A- BEGIN 

2 1) - SET AS < 16*30 0 0 00 >5 

3 D- SYNC2S»TRUE? 

4 :t ) — iptr 4-0 ; 


41 

47 

43 

44 

45 

46 

47 

48 

49 


5 

1 ) 

SCNT2S® 

6 

1 ) — 

ERR t-0 ; 

7 

1 ) -- 

n:®2? 

8 

1 ) — 

II1S«1? 

9 

1.) 

I2S»2? 

10 

D — 

i3 ; *3 ; 

1.1 

1)~ 

n ; —30 ; 

12 

1 ) — 

KS»1Z8? 


1 ) — 



50 

51 

52 

53 


1 ) — 

13 1>~ FOR IBS® 1 TO N DO 

14 1 ) ■ — FOR IPS® 1 TO 4 DO 

1 ) B— BEGIN 


54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 


1 ) — 

15 1) — 

1 > C 

i ) — i: 

i ) L 

i ) i: 

i) c 

i ) — i: 

i )* r. 

i> — r: 

i) - r. 

16 :i. ) — 


SYNCRO < ERR tSCNT2»CMAX» SYNC 1 >t 
SCNT2S=0? 3 

repeat ::i i: synchronize with processor i :\ 

SCNT2 ♦— 8CNT2+1 ? 3 

IF SCNT2 > CM AX THEN 3 

BEGIN 3 

SYNC IS®' TRUEf 3 

ERR * - ERR + 1? 3 

END ? 3 

UNTIL SYNC1? 3 

syncis® false; 


20 



66 


D-- 


67 

17 

i> — 

BPS" BC IP f IP f IB* 12 3? 

68 

18 

1 ) — 

j:=* ip + i f 

69 

19 

1>~ 

WHILE J <= 4 00 

70 


1)0 

BEGIN 

71 

20 

1 ) — 

BC IP»JfIB*I2 lit" BC IP? J> IB >12 3 / BP 5 

72 

21 

1 ) — 


73 


1>-C 

END ? 

74 

77 

1 ) — 

2? 

75 

23 

1 

WHILE J <■ 4 DO 

76 


1)0 

BEGIN 

77 

24 

1)— 

BC IP>J>IB#I3 :ti= BC IP r J ? IB ? 13 3 / BPS 

78 

25 

1 ) - — 

Jt = J + 2 ? 

79 


i)-C 

END ? 

80 


1) — 


81 


1 ) — 


82 

26 

1) — 

IF IP <> 4 THEN 

83 


1)0 

BEGIN 

84 

27 

1) 

no :r.p + i? 

85 

28 

1 > — 

FOR i:= 11 TO '1 DO 

86 


1)0- 

BEGIN 

87 

29 

1>~ 

BRJ» BC I r IF* t IB ? 12 3S 

88 

30 

n~ 

Jt» IP + IS 

89 

31 

i>“ 

WHILE J <« 4 DO 

90 


1)0 

BEGIN 

91 

32 

i>— 

BC 1 9 J f IB ? 12 3S« BC I»J»IB>I2 3 - 

92 

33 

1)-- 

JS- J + 2 ? 

93 


1)H~ 

END ? 

94 

34 

1 ) — 

JS- 2? 

95 

35 

1 ) ~~ 

WHILE J <* 4 DO 

96 


1)0 

BEGIN 

97 

36 

1)“ 

BC I ? J > IB r 13 3?" BC I?JfIB>I3 3 - 

98 

37 

1)-- 

J t == J + 2? 

99 


1)-E 

END? 

100 


D-D 

END? C FOR 13 

101 


1 ) — C 

END? CIP IP 3 

102 

38 

1) — 

IF IB <> N THEN 

103 


DC- 

BEGIN 

104 

39 

1 ) — 

FOR It* 1 TO 4 DO 

105 


1 ) 0- 

BEGIN 

106 

40 

1 ) — 

IBIS* IB + 15 

107 

41 

1 ) — 

BR:« BC I>IP*IB1>II13S 

108 

42 

1 ) -- 

JS« IP 15 

109 

43 

D-- 

WHILE J <« 4 DO 

11 0 


1)0 

BEGIN 

111 

44 

1) — 

BC IfJfIBIflll 3J« BC If JrlBlflll 

112 

45 

1 ) — 

Jt~ J + 2? 

113 


1)HE 

END? 

114 

46 

1) — 

JS«* 2? 

115 

47 

D-- 

WHILE J <« 4 DO 

116 


1)0 

BEGIN 

117 

48 

D — 

BC If Jf XB1 f 12 :it== BC I f Jf IB1 y 12 3 

118 

49 

1 ) — 

j:== j + 2? 

119 


1 ) — E 

END? 

120 


1 ) — D 

END? C FDR 13 

121 


1)-C 

END? C IF IB 3 

122 

50 

D — 

SYNC2 ♦“TRUE? 

123 


1)-B 

END? C FOR IP 2 

124 


1) — 


125 

51 

1 ) — 

SYNCROC ERR f SCNT2 ? CMAX > SYNCl >S 

126 

52 

1 ) — 

SYNC1 * "FALSE* 

127 


1 ) — 


128 


1) — 

C COLUMN SWEEP BACKSUBSTITIJTION ALGORITHM 3 

129 

53 

1 ) -- 

FOR IBS" 1 TO N DO 

130 


1)0 

BEGIN 

131 

54 

1 ) — 

i:*2* 

132 

55 

1) — 

WHILE I <■ 4 DO 

133 


1)0 

BEGIN 

134 

56 

D-- 

CCIfIB3S* -DO * CCItIB3S 

135 

57 

1 ) — 

i:»i + 2? 

136 


1 ) — c 

END? 

137 


1 ) — B 

END? 

138 

58 

1)~ 

SYNC2J "TRUES 

139 

59 

1 ) — 

SYNCROC ERR r SCNT2 r CMAX ? SYNC 1 >5 

140 

60 

1)- 

SYNCIS «FALSE S 

141 


1 ) — 


142 

61 

1) — 

FOR IBIS" 1 TO N DO 


BR * BC IP»JfIBrI2 3? 


BR * BC IP* J *18*13 :i5 


21 


143 


1 ) B- 

BEGIN 

144 

62 

1 ) ~ 

n + i - ibi; 

145 

63 

1 ) — 

ibi:= ib - i? 

146 

64 

1>“ 

DUC4fIB3:= CC4fIB3? 

147 

65 

1 ) 

FOR IS” 4 DOWNTO 1 DO 

148 


1)0 

BEGIN 

149 

66 

1 ) — 

I15-I-1? 

150 

67 

l)--- 

IF ( NOT ((IB « 1) AND <1 < 

151 


1)D- 

BEGIN 

152 

68 

1>~ 

JS«I1 - 1? 

153 

69 

1 ) • — 

WHILE J >« 1. DO 

154 


1)10 

BEGIN 

155 

70 

1 ) — 

ci::JfIB3S== ccjfIB3 

156 

71 

1> — 

J * « J—2 ? 

157 


1 ) -E 

END? 

158 


1 ) — D 

END? 

159 

72 

1 ) 

IF IB <> 1 THEN 

160 


1 ) D- 

BEGIN 

161 

73 

1 ) 

Ji*2> 

162 

74 

1>~ - 

WHILE J <” 4 DO 

163 


1 ) E- 

BEGIN 

164 

75 

1) -- 

ci::jfIbi:i:” cc j»ib: 

165 

76 

1 ) 

J S 5= J 4- 2? 

166 


1)-E 

END ? 

167 


1 ) * D 

END ? 

168 

77 

1) — 

SYNC2t«TRlJE? 

169 

78 

1 ) — 

SYNCRO < ERR f SCNT2 r CMAX f SY 1 

170 

79 

1 ) — 

SYNC 1 t -FALSE? 

171 


1 ) -C 

END ? 

172 


1 ) ~B 

END? 

173 


1 ) — 


174 

80 

1 > — 

PUTINT < ERRfIPTR )? 

175 

81 

1 ) — 

PUTINT ( SCNT2fIPTR )? 

176 


i >— a end? c coproc :i 

177 


:l ) — 


**** 8YNCR0 

ASSUMED 

EXTERNAL 

**** PUTINT 

ASSUMED 

EXTERNAL 

SETA5 

ASSUMED 

EXTERNAL 


1)> AND Cl! <> 1)> THEN 


BCJvIrIE:»23 * DUCIfXB3» 


- * DUE If IB 3? 


1 >5 


178 82 0 ) A- BEGIN 

179 83 0 ) — COPROC? 

180 0 ) —A END* 


xokxok NO ERROR(8) AND NO WARNINGS > DETECTED 
**** 180 LINES 4 PROCEDURES 
**** 1040 PC ODE INSTRUCTIONS 


Three-Processor Block Tridiagonal Solver; Processor 1 


1( 

0 > 

0 ) 

PROGRAM SOLVE? 

?J. 

0) 

0 :> 


3< 

0 ;■ 

0 ) 

TYPE 

4 < 

0) 

0 > 


5< 

0) 

0 > 

INT5~ARRAY Cl. .53 OF INTEGER? 

6< 

0> 

0 ) 

RE AL4™ : ARRAY 1" 1 * * 4 '.j OF REAL? 

7 ( 

0) 

0 > 

RVEGT “ARRAY t: 1 « • 4 y 1 » * 32 “1 OF REAL 5 

8 < 

0) 

0 ) 

AMAT™ ARRAY I" 1 * * 4 v 1 * «■ 4 r 1 ♦ « 32 ri»i 3 “1 OF REAL? 

9< 

0) 

0 > 


IOC 

0) 

u ) 

VAR 

IK 

0) 

0) - 


:l. 2 ( 

•■••6144 ) 

0 ) 

B S AMAT ? 

13 C 

”7168) 

0 ) - 

CrDU S RVECT? 

14 C 

-■7168) 

0) 


15 C 

•7168) 

0 ) 


16 C 

0) 

1) 

PROCEDURE GET INF* < VAR ADDRS RVECT? NUMELS INTEGER )? FORWARD? 

17 < 

0) 

1 > 


18 C 

0) 

1) 

PROCEDURE PUTINT < IVALS INTEGER? VAR IPTRJ INTEGER )? FORWARD? 

.1. 9 ( 

0) 

I ) 


20 ( 

0) 

1) 

PROCEDURE SYNCRO 2 C VAR SYNC INF' S INT5 ) ? FORWARD ? 

21 C 

CO 

:!. ) - 


22 < 

0) 

1) 

PROCEDURE IDATAC VAR MATRIX A S AMAT? VCNT J INTEGER )? FORWARD? 


22 


I 

I 

! 


23 1 0 ) 1. ) — 

24 < 0) 1) — PROCEDURE XDATF ( MAR MATRIXF X RMECT ? VCNT1 X INTEGER ) ? FORWARD ? 

25 < 0) D — 

26 < 0> 1 > PROCEDURE COPROC ? 

27 ( 0) 1> — 

28 < 0) D— CONST 

29 < 0) :L ) — 

30 < 0> D~ CMAX=1000000? 

31 < 0) :t ) • — MAR 

32 < 0) i> — 

33 < -64) .1.) — AI2rAIIDBI2rBI3 X REAL 'ft 

34 C -576) D— F 5 RMECT* 

35 < -584) 1.) — BP v BR t REAL, ? 

36 ( -6 0 8 ) 1 ) N f X r J r IB y IP r 1 1 5 INTE GER * 

37 C —628 > 1.) — T2*I3r 1X1 »Kr J1 X INTEGER 5 

38( -6' f4> D — lit JBrlBlrlBI 5 INTEGER* 

39 ( -656 > 1 > ERR y SCNT 1 » IPTR X INTEGER ? 

40 < -676) 1 ) SYNCTAB X INT5 ? 

41 < -676) 1.) — 

'12 1 1)A- BEGIN 

43 2 l>~ IPTR : =0 5 

44 3 1 ) — SYNCT ABC 3 ill 5 =CMAX ? 

45 4 1 > — SYNCTABC 4 2 X -0 ? 

4 6 5 1 ) — SYNCT ABC 1 2 5 = 1 J 

47 6 D — SYNCT ABC 5 II * —3 ? 

TO 7 1) — 1115= It 

4? 8 1. ) — 12 5=2? 

50 9 1 ) — 13 5=3? 

51 10 1) - N 5=30 ? 

52 li- 
sa 1.1 1) — ID AT A< By 1536 )? 

54 12 1) — XDATF < Cr 128 )? 

55 1> — 

56 1>~ 

57 13 1) — FOR IB 5 = 1 TO N DO 

53 14 1) FOR IP 5= 1 TO 4 DO 

59 1 ) 8 - BEGIN 

60 1) 

61 15 1> SYNCR02* SYNCTAB )? 

62 16 D — BP 5= BE IP r IP r IB * 12 3? 

63 17 1)— 1.15= IP + 1? 

64 18 1)— IBlt=IB 1? 

65 1 > — 

66 l>-~ 

67 19 1) — J5=I P? 

63 20 1) — WHILE J <= 4 DO 

69 DC- BEGIN 

70 21 1 ) — BI2CJ3i*BC IPr JrlBr 12 3? 

71 22 D— U5- J + 3? 

72 D-C END? 

73 1> — 

7 * t 23 D — .15 = 15 

75 24 D- IF ( (IP » 1) OR (IP * I > ) 

76 25 D— THEN K5= 4 

77 26 1 ) — ELSE K5= 3? 

78 1 ) ■ — 

79 27 1 )■ WHILE J <= 4 DO 

80 DC- BEGIN 

81 28 D BI3C J BC IPy Jy IBrI3 3? 

32 29 1) — J 5 = J + K? 

83 D-C END? 

84 1 ) 

85 30 D J ♦ = IP? 

86 31 1) — WHILE J <= 4 DO 

87 DC- BEGIN 

88 32 D— BI2C J 35 = BI2C J 3 / BP? 

89 33 D— BC IP»J»IB»I2 BI2C J 3? 

90 34 1) - ..J 5 - J+3 ? 

91 1>-C END? 

92 1 ) — 

93 35 D— J5= 1? 

94 36 1 ) — WHILE J <= 4 DO 

95 DC- BEGIN 

96 37 1 ) — BI3C J :iS= BI3C J 2 / BP? 

97 38 1 ) — BC IP y Jr IB y 13 35== BI3C J 3? 

98 39 D— J 5 - J + K? 

99 D-C END? 


23 






1.0 0 


1 ) 


101 

40 

1 ) 

IF XP <> 4 THEN 

102 


1 ) C- 

BEGIN 

103 

41 

1) 

FOR i:» XI TO 4 DO 

104 


1)0- 

BEGIN 

I 05 

42 

1 > 

BRi» BC X y IP y IB r12 35 

1.06 


1 ) 


:i 07 

43 

1. ) 

j:» xp + 3 ; 

108 

44 

1 ) 

WHILE J <» 4 00 

1 09 


1 > E- 

BEGIN 

11 0 

45 

1. ) 

BC X y J y IB f 12 US* BE IfJ»IB»I2 1 - BR * BI2C J '.1? 

m 

46 

1 ) -- 

J * 0 3 ? 

112 


1 ) -E 

END ? 

113 


1 ) -- 


i 14 

47 

1 > 

Jtw 1? 

115 

48 

1 ) 

WHILE J O 4 00 

116 


1>E- 

BEGIN 

11.7 

49 

1 ) 

BE X y J y IB y 13 3t« BC Xf JfXB»X3 3 - BR * BI3C J I? 

118 

50 

1 )• 

J + k; 

1 19 


1 > -E 

end ? 

120 


1) 


121 


1 ) -D 

END ? c FOR x:i 

122 


1 ) -C 

end; i::xf ipu 

123 


1 ) 


1 24 

51 

1 ) 

IF IB <> N THEN 

125 


1 . ) c- 

BEGIN 

126 

52 

1 ) — 

FOR It* 1. TO 4 BO 

1 27 


1)0- 

BEGIN 

128 

53 

1 ) — - 

BRt* or: :DiP ? iB:t.,n:i. ::i? 

129 


1 ) — 


130 

54 

1) - 

Jt« ip 3 ; 

131 

55 

1. ) 

WHILE J <« 4 00 

1 32 


1 ) E 

BEGIN 

133 

56 

1 ) 

be :r> j,ibi,.ii:i. be i»j»iBifin :i - br »■ bi2E ,.\ :.t? 

134 

57 

1 ) 

j: = j + 3 ? 

135 


i ) -e 
■1 > 

END ? 

.1. X.H.) 

137 

58 

.1. j 

i> - 

Jt» 1? 

138 

59 

i ) — 

WHILE J O 4 00 

139 


1>E- 

BEGIN 

1710 

60 

1 ) •— 

Bi: Ty Jy IB1 yI2 3S* BE I y J » IBl • 12 I - BR * BI3L ,.! I? 

141 

61 

1 ) 

j:* j + k? 

.1.42 


1 > -E 

END ? 

143 


1 ) 


144 

62 

1) 

IF < (IP « 1) OR (IP * 4 > ) THEN 

145 

63 

1. ) 

cc i f iBi : 1 s* cl :i>ibi. :i - br * ci: xpyiB ::i ? 

1 46 


1) 


147 


1 ) -D 

end; lfdr i::i 

148 


1 > -c 

END? t: IF IBI 

149 


1. ) -B 

END? C FOR XP ::i 

150 


1 ) - 


1 51 

64 

1. ) - 

SYNCR02 ( SYNCTAB >5 

152 


1 ) 


1 53 


1) 1 

: COLUMN SWEEP BACKSUBSTITUTION ALGORITHM 1 

1.54 

65 

1) 

FOR IB * = 1 TO N DO 

155 


1)0- 

BEGIN 

156 

66 

1 ) — 

IS » 13 

:l 57 

67 

1 ) 

WHILE I <> 4 DO 

158 


1 ) c- 

BEGIN 

159 

63 

1> 

ci:: 1 9 IB 11 x - 1,0 * a: 1 , ib ::i; 

160 

69 

1 ) 

IS«I <• 3 ; 

161 


1. > - c 

END? 

162 


1 ) -B 

END ? 

1 63 


1 ) 


164 

70 

1 ) 

•i \ 

SYNCR02 ( SYNCTAB > ? 

J O.J 
166 

71 

. 1 . ) 

1 ) 

FOR IBIS 1 TO N DO 

:l 67 


1 ) B- 

BEGIN 

168 

72 

1 ) 

IB s as N < 1 "■ IBI? 

169 

73 

1 > - 

IB1S = IB - 15 

1.70 

74 

1. ) 

DUE 4 y IB It™ (T4 vIB::U 

171 

75 

1 ) 

l :: 'OR IS* 4 DOWNTO 1 DO 

1 72 


I. > c- 

BEGIN 

173 

76 

1 ) 

11 i«I-l t 

1 74 

77 

1 ) 

IF ( NOT < (IB a i) AND <1 = 1)> AND (I <> 1)> THEN 

175 


1 ) 0- 

BEGIN 

:l 76 

73 

1 ) 

JS*I15 


24 

ORIGINAL PAGE IS 


OF POOR QUALITY 



ORIGINAL PAGE IS 
OF POOR QUALITY 


177 79 :l. ) WHIL E J >= :l. DO 

1 78 1 ) E BEGIN 

:i.79 bo :i. > ccjyiea - bi::j,i,ibv2::i :« dui::i ? ib::i; 

180 81. 1. ) 

1.81 1>~E END ? 

182 1 ) • D END ? 

1.83 82 1) IF IB <> 1 THEN 

1 8H 1 ) D • BEGIN 

1.85 83 1 ) J : -2 5 

186 B'\ 1. ) - WHILE J O 'I OG 

137 1 )E- BEGIN 

138 35 i) ccj,ibi:i:« ci::j,ibi::i - Br.:j,i,iBi,3:i * our.: 1,10:1? 

139 86 1 ) + 3? 

190 1)-E END? 

191 1)™D END? 

1 92 37 1 > SYNCR02 < BYNCTAB > 5 

193 88 1 ) — DUCI1 rIB3t«CCIl rZBlt 

19* 89 1.) SYNCR02C BYNCTAB )? 

195 1) G END? 

196 1. ) -B END? 

197 1)~~ 

I 93 90 1 ) ERR t «8YNCTABC 1 ? 

199 91 1) - PUTINT ( ERRvIPTR >? 

20 0 92 1. ) - SCNT1 1 «SYNCTABC 2 II ? 

201 93 1) PUTINT < SCNTlrlPTR >? 

202 9* 1 ) K t -“128 ? 

203 95 1 ) - GET IN I™ < DU, K >? 


20* 1. ) - A END? i: G8PR0C :.l 

205 1) 

aox*:* IDATF ASSUMED EXTERNAL. 

**** ID AT A ASSUMED EXTERNAL. 

**** SYNCR02 ASSUMED EXTERNAL 

»:>**:# PUTINT ASSUMED EXTERNAL. 

**** GETINP ASSUMED EXTERNAL 

206 96 0 ) A- BEGIN 

207 97 0>- COPROCS 

208 0 ) -A END <• 


NO ERROR < S ) AND NO WARNING ( S > DETECTED 
ioaoK 20 8 LINES 6 PROCEDURES 
1150 P COD IT INSTRUCTIONS 


Three-Processor Block Tridiagonal Solver; Processor 2 


1 C 0 > 0 > 

2 ( 0) 0> PROGRAM SOLVE? 

3< 0> 0) - 

A )( 0> 0> TYPE 

5 ( 0 > 0 ) 

6C 0) 0) INT5- ARRAY Cl. ,53 OF INTEGER? 

7 < 0> 0) REAL'*! 88 ARRAY C 1. •» * * !1 OF REAL.; 

SC 0) 0) RVECT-ARRAY C !♦ * , 1 «• ♦ 32 J OF REAL? 

9< 0) 0 ) - AMAT : ® ARRAY C 1 * . *, 1 * .*, 1. ♦ .32, 1 . .SH OF REAL? 

1 0 < 0)0) - 

IK CO 0) VAR 

1.2 < 0) 0>~ 

1. 3 < -61** > 0) B t AMAT? 

l/K -7163) 0 ) - — C ,DU t RVECT? 

15 C -7168) 0) 

16 < -7168) 0 ) 

17 < 0) 1.) PROCEDURE SET AS ( IQ FT ST i INTEGER )? FORWARD? 

18 < 0) 1) 

19 C 0) 1.) PROCEDURE PUTINT < IDALt INTEGER? VAR IPTRt INTEGER )? FORWARD? 

20 ( 0) 1) 

21. < 0) 1>~ PROCEDURE SYNCR02C< VAR SYNCINF : INT5 )? FORWARD? 

22 C 0) 1) 

23 < 0) I.) PROCEDURE CQPROC ? 

2* C 0 > 1. ) 

25 C 0) 1) CONST 

26 ( 0> 1 ) 


25 


27 < 0) 1> CMAX=10000005 

2(3 ( 0> 1 ) MAR 

79 C 0)1 ) 

30 < -64 > 1 ) AX 2 v AIX1 ? BX2 > BX3 t REAL-4 ? 

3 .1. C -576) :l. ) F t RVECTi 

32 < -534) 1 > BP y BR t REAL? 

33 < -6 0 3 ) 1 ) N r X V J y IB y IP f I 1 : INTEGER 5 

3-4 < -628 ) 1 > 12 y 13 y III y K y J1 l INTEGER 5 

35 c --644 ) 1 ) XI y JB y XB1 y XBX i INTEGER 5 

36 ( -“656 ) 1. ) ERR y SCNT 1 y IPTR X INTEGER 5 

37 ( -676) 1 ) * — SYNCTAB X INT5 5 


iXH "0/0/ .1. / 

39 ( -676) 1) 

40 1 DA- BEGIN 

41 2 D SETA5C 16*300000 > ? 

*42 3 D iptrdo? 

43 4 1 ) SYNCT ABC 3 :i X --CMAX 5 

44 5 1 > - SYNCT ABC 4 :i 1 = 0 ? 

45 6 1 > - SYNCT ABC 1 3 ? =*2 ?- 

46 7 1 > SYNCT ABE 5 T X ? 

47 8 D HID 1 ? 

43 9 D 12: =2? 

49 10 1 ) 13 1 -3 ? 

50 ii i) nd3o; 

31 D 

52 1> 

53 12 D- FOR IBD 1 TO N DO 

54 13 D FOR IPD 1 TO 4 00 

55 1)B- BEGIN 

56 14 1) 3 YNCR02C < SYNCTAB ) ? 

57 15 1 > II D IP 1? 

58 16 1 > TB;I. t » :: IB + 1 ; 

59 :i.7 d bpd bi:: ip » ip r 10*12 :i? 

60 D 

61 D C FOR ID II. TO 4 DO 

62 1 ) A 121 : i ::i : *ec i y ip , ib y 12 ::i ? 

63 1) FOR ID 1 TO 4 DO 

64 d ahiciidbci* ip»iBi fin 3 ? 

65 1) 

66 18 1) JDIP > 1? 

67 19 1) WHILE J <= 4 DO 

68 1 ) C- BEGIN 

69 20 1) BI2CJ3DBC IPpJrIBfI2 3? 

70 21 1) ...I i ™ ,J * 3f ' 

71. 1 ) "C END ? 

72 1 > 

73 22 1) JD2? 

74 73 1 ) IF ( (IP == 1) OR (IP = 4) ) 

75 24 1) THEN K ! 2 

76 25 D ELSE KD 35 

77 26 1) WHILE J <™ 4 DO 

78 1 ) C- BEGIN 

79 27 1> BI3C J 3D BC IP* J* IB » 13 35 

80 23 1) ,..n J + K? 

81 D-C END? 

82 1 ) 

33 29 D JD IP + IS 

34 30 1) WHILE J <= 4 DO 

35 1 ) C- BEGIN 

36 31 1) 812 C J ::i i = BJ2C J 3 / BP 5 

37 32 1) BC IPtJfIBpI 23D BI2C J :i? 
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